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Abstract 

Because  minorities  typically  fare  poorly  on  standardized  tests,  job  testing  is  thought  to  pose 
an  equity-efficiency  trade-off:  testing  improves  selection  but  reduces  minority  hiring.  We  evaluate 
this  trade-off  using  data  from  a  national  retail  firm  whose  1,363  stores  switched  from  informal  to 
test-based  worker  screening.  We  find  that  testing  yielded  more  productive  hires  -  raising  median 
tenure  by  10  percent  and  reducing  the  frequency  of  firing  for  cause.  Consistent  with  prior  research, 
minorities  performed  significantly  worse  on  the  test.  Yet,  testing  had  no  measurable  impact  on 
minority  hiring,  and  productivity  gains  were  uniformly  large  among  minorities  and  non-minorities. 
We  show  formally  that  these  results  imply  that  employers  were  effectively  statistically  discriminat- 
ing prior  to  the  introduction  of  testing  -  that  is,  their  screening  practices  already  accounted  for 
expected  productivity  differences  between  applicant  groups.  Consequently,  testing  improved  selec- 
tion of  both  minority  and  non-minority  applicants,  but  did  not  alter  the  racial  composition  of  hiring. 

JEL:  D63,  D81,  J15,  J71,  K31,  M51 

Keywords:  Job  testing.  Discrimination,  Economics  of  minorities  and  races,  Worker  screening,  Pro- 
ductivity, Personnel  economics 
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1      Introduction 

In  the  early  20th  century,  the  majority  of  unskilled,  industrial  employees  in  the  United  States  were 
hired  with  no  systematic  efforts  at  selection  (Wilk  and  Cappelli,  2003).  Sanford  Jacoby's  well-known 
industrial  relations  text  describes  an  early  20th  century  Philadelphia  factory  at  which  foremen  tossed 
apples  into  crowds  of  job-seekers,  and  hired  the  men  who  caught  them  (Jacoby,  1985,  p.  17).  More 
recently,  Mvirnane  and  Levy  (1996,  p.  19)  quote  a  company  manager  describing  Ford  Motor  Company's 
hiring  process  in  1967:  "If  we  had  a  vacancy,  we  would  look  outside  in  the  plant  waiting  room  to  see 
if  there  were  any  warm  bodies  standing  there."  These  hiring  practices  are  no  longer  commonplace. 
During  the  1980s,  as  much  as  one-third  of  large  employers  adopted  systematic  skills  testing  for  job 
apphcants  (Bureau  of  National  Affairs,  1980  and  1988).  But  skills  testing  has  remained  rare  in  hiring 
for  hourly  wage  jobs,  where  training  investments  are  typically  modest  and  employment  spells  brief 
(Aberdeen,  2001).  Due  to  advances  in  information  technology,  these  practices  are  now  poised  for 
change.  With  increasing  prevalence,  employers  use  computerized  job  appUcations  and  assessments  to 
administer  and  score  personality  tests,  perform  online  background  checks  and  gmde  hiring  decisions. 
Over  time,  these  tools  are  Ukely  to  become  increasingly  sophisticated,  as  for  example  has  occurred  in 
the  consumer  credit  industry. 

Widespread  use  of  job  testing  has  the  potential  to  raise  aggregate  productivity  by  improving  the 
quality  of  matches  between  workers  and  firms.  But  there  is  a  pervasive  concern,  reflected  in  public 
poHcy,  that  job  testing  may  have  adverse  distributional  consequences,  commonly  called  'disparate 
impacts.'  Because  of  the  near  universal  finding  that  minorities,  less-educated  and  low-socioeconomic- 
status  (SES)  individuals  fare  relatively  poorly  on  standardized  tests  (Neal  and  Johnson,  1996;  Jencks 
and  Phillips,  1998),  job  testing  is  thought  to  pose  a  trade  off  between  efficiency  and  equity;  better 
candidate  selection  comes  at  a  cost  of  reduced  opportunity  for  groups  with  lower  average  test  scores 
(Hartigan  and  Wigdor,  1989;  Hunter  and  Schmidt,  1982).^  This  concern  is  forcefully  articulated  by 
Hartigan  and  Wigdor  in  the  introduction  to  their  influential  National  Academy  of  Sciences  Report, 
Fairness  in  Employment  Testing  (p.  vii): 

"What  is  the  appropriate  balance  between  anticipated  productivity  gains  from  better  em- 
ployee selection  and  the  well-being  of  individual  job  seekers?  Can  equal  employment  oppor- 
timity  be  said  to  exist  if  screening  methods  systematically  filter  out  very  large  proportions 
of  minority  candidates?" 

This  presumed  trade-off  has  garnered  substantial  academic,  legal  and  regulatory  attention,  including  a 


^Jencks  and  Phillips  (1998)  report  that  in  1986,  the  mean  black-white  test  score  gap  on  the  Armed  Forces  Qualification 
Test  (an  IQ  test)  was  0.7  to  0.9  standard  deviations. 


landmark  Supreme  Court  decision  limiting  use  of  employment  tests  that  are  not  directly  job-relevant 
{Griggs  v.  Duke  Power  Co.,  3  FEP  Cases  175,  1971),  a  series  of  Equal  Employment  Opportunity 
Commission  guidelines  regulating  employee  selection  procedures  (U.S.  Department  of  Labor,  1978), 
and  two  National  Academy  of  Sciences  studies  evaluating  the  efficacy  and  fairness  of  job  testing 
(Hartigan  and  Wigdor,  1989;  Wigdor  and  Green,  1991).^ 

Yet,  despite  a  substantial  body  of  research  and  policy,  the  evidence  for  an  equity-efficiency  trade-off 
in  job  testing  is  not  well  established.  As  our  illustrative  model  below  demonstrates,  there  are  two  as- 
sumptions underlying  the  presumed  trade-off,  and  these  assumptions  do  not  appear  equally  palatable. 
The  first  assumption  is  that  employment  tests  provide  a  valid  predictor  of  worker  productivity;  if  so, 
testing  has  the  potential  to  improve  apphcant  selection.'^  The  second  assumption  is  that,  absent  job 
testing,  firms  hire  in  a  manner  that  is  blind  to,  or  weakly  correlated  with,  the  tested  attribute;  if  so, 
testing  will  reduce  hiring  rates  from  demographic  groups  with  below  average  test  scores  (a  disparate 
impact). 

Because  competitive  employers  face  a  strong  incentive  to  select  and  remunerate  workers  according 
to  productivity,  a  setting  where  hiring  is  blind  to  an  important  productive  characteristic  appears 
artificial.^  Consider  instead  a  case  where  firms  screen  informally  for  a  tested  attribute  and  testing 
improves  the  accuracy  of  screening.  Will  the  resulting  gain  in  screening  precision  reduce  hiring  from 
low  scoring  groups?  As  we  show  below,  the  answer  is  ambiguous  without  further  assumptions;  hiring 
rates  from  groups  with  low  scores  could  rise  or  fall  shghtly.  If  firms  already  screen  imperfectly  for  a 
tested  attribute,  improved  precision  has  no  intrinsic  imphcations  for  relative  hiring  of  different  worker 
groups. 

One  special  case  is  of  particular  economic  interest,  however.  Economists  have  long  recognized 
that  profit  maximizing  employers  face  an  incentive  to  statistically  discriminate  -  that  is,  to  use  group 
demographic  characteristics,  such  as  education,  gender  or  race  to  assess  the  expected  productivity  of 
job  applicants  (Phelps,  1972;  Aigner  and  Cain,  1977).    Statistical  discrimination  imphes  that  firms 


In  Griggs,  the  Supreme  Court  found  that  the  firm's  requirement  that  job  applicants  hold  a  high  school  diploma 
constituted  an  artificial  barrier  to  the  hiring  of  black  workers.  The  court  held  that  any  requirement  used  in  an  employment 
selection  procedure  is  by  law  a  "test,"  and  that  all  such  tests  must  be  job-related. 

In  an  exhaustive  assessment,  Wigdor  and  Green  (1991)  find  that  military  recruits'  scores  on  the  Armed  Forces 
Qualification  Test  (AFQT)  accurately  predict  their  performance  on  objective  measures  of  job  proficiency.  Similarly, 
based  on  an  analysis  of  800  studies,  Hartigan  and  Wigdor  (1989)  conclude  that  the  General  Aptitude  Test  Battery 
(GATB),  used  by  the  U.S.  Employment  Service  to  refer  job  searchers  to  private  sector  employers,  is  a  valid  predictor 
of  job  performance  across  a  broad  set  of  occupations.  The  personnel  psychology  literature  also  finds  that  commonly 
administered  personality  tests  based  on  the  "five  factor  model"  are  significant  predictors  of  employee  job  proficiency 
across  almost  all  occupational  categories  (Barrick  and  Mount,  1991;  Tett,  Jackson  and  Rothstein,1991;  Goodstein  and 
Lanyon,  1999). 

Not  all  researchers  fail  to  recognize  that  this  assumption  is  problematic.  Hartigan  and  Wigdor  (1989,  chapter  12) 
critque  Hunter  and  Schmidt's  (1982)  widely  cited  analysis  of  the  potential  economic  gains  from  job  testing,  noting  that 
Hunter  and  Schmidt's  results  depend  upon  the  unrealistic  assumption  that  absent  testing,  worker  assignment  is  random. 


hold  rational  expectations;  conditional  on  observable  characteristics,  their  assessments  of  individual 
productivity  are  unbiased.  In  this  case,  disparate  impacts  from  testing  on  productivity  are  especially 
likely  to  be  small.  The  reason,  as  we  show  below,  is  that  under  statistical  discrimination,  firms 
equate  the  productivity  of  marginal  hires  from  each  applicant  groups,  both  with  and  without  the  use 
of  testing.  Consequently,  gains  from  testing  accrue  primarily  from  better  selection  within  applicant 
groups  (i.e.,  minorities,  non-minorities)  rather  than  from  shifts  in  the  racial  composition  of  hiring. 

The  preceding  discussion  suggests  that  the  trade-off  between  efficiency  and  equity  in  hiring  is 
an  empirical  possibility  rather  than  a  theoretical  certainty.  To  evaluate  this  trade-off  requires  a 
comparison  of  the  hiring  and  productivity  of  comparable  workers  hired  with  and  without  employment 
testing  at  comparable  employers.  To  our  knowledge,  there  is  no  prior  research  that  performs  this 
comparison.^  In  this  paper,  we  empirically  evaluate  the  consequences  of  private  sector  applicant 
testing  for  minority  employment  and  productivity.  We  study  the  experience  of  a  large,  geographically 
dispersed  retail  firm  whose  1, 363  establishments  switched  from  informal,  paper-based  hiring  methods 
to  a  computer-supported  screening  process  during  1999  and  2000.  Both  hiring  methods  use  face  to 
face  interviews,  while  the  electronic  assessment  tool  also  places  substantial  weight  on  a  computer- 
administered  personality  test.  We  use  the  rollout  of  this  technology  over  a  twelve  month  period  to 
contrast  contemporaneous  changes  in  productivity  and  minority  hiring  at  establishments  differing  only 
in  whether  or  not  they  adopted  employment  testing  in  a  given  time  interval. 

We  find  strong  evidence  that  testing  yielded  more  productive  hires  -  increasing  median  employee 
tenure  by  10  percent,  and  slightly  lowering  the  frequency  at  which  workers  were  fired  for  cause. 
Consistent  with  a  large  body  of  work,  analysis  of  applicant  data  reveals  that  minorities  and  low  SES 
applicants  performed  significantly  worse  on  the  employment  test.  Had  managers  initially  been  hiring 
imsystematically  (i.e.,  in  a  manner  uncorrelated  with  the  test),  simple  calculations  suggest  that  testing 
would  have  lowered  minority  hiring  by  approximately  10  to  25  percent.  This  did  not  occur.  We  find  no 
evidence  that  employment  testing  changed  the  racial  composition  of  hiring  at  this  firm's  1,  363  sites. 
Moreover,  productivity  gains  were  uniformly  large  among  both  minority  and  non-minority  hires.  The 
combination  of  uniform  productivity  gains  and  a  lack  of  adverse  hiring  impacts  suggests  that  employers 
were  effectively  statistically  discriminating  prior  to  the  introduction  of  employment  testing. 

Our  paper  is  related  to  a  broad  theoretical  and  empirical  literature  on  the  economics  of  worker 
screening.  Key  theoretical  contributions  include  Spence  (1973),  Stiglitz  (1975)  and  Salop  and  Salop 
(1976),  who  analyze  models  of  screening,  signaling,  and  self-selection,  and  Phelps  (1972)  and  Aigner 


Although  a  large  literature  evaluates  the  likely  impacts  of  testing  on  private  sector  hiring,  all  studies  that  we  are 
aware  of  compare  anticipated  or  actual  hiring  outcomes  using  an  employment  test  to  a  hypothetical  'unsystematic  hiring' 
case  in  which  no  alternative  formal  or  informal  applicant  screen  is  used.  As  noted  above,  we  view  this  hypothetical  case 
as  unlikely. 


and  Cain  (1977),  who  provide  the  classic  theoretical  treatments  of  statistical  discrimination.  A  number 
of  recent  studies  assess  the  role  of  race  in  employers'  hiring  decisions.  Altonji  and  Pierret  (2001) 
develop  a  dynamic  learning  model  to  test  for  employer  statistical  discrimination  in  a  longitudinal 
panel  of  worker  earnings,  and  find  little  evidence  of  race-based  statistical  discrimination.^  Holzer, 
Raphael,  and  Stoll  (2002)  analyze  the  effect  of  employer-initiated  criminal  background  checks  on  the 
likelihood  that  employers  hire  black  workers  and  conclude  that,  in  the  absence  of  criminal  backgroimd 
checks,  employers  statistically  discriminate  against  black  applicants.  Bertrand  and  Mullainathan 
(forthcoming)  conduct  an  audit  study  of  employer  callback  rates  for  job  applications.  They  find 
that  apphcants  with  'black-sounding'  names  receive  significantly  fewer  callbacks  than  applicants  with 
'white-sounding'  names,  a  result  that  is  potentially  consistent  with  either  taste-based  or  statistical 
discrimination.^ 

Our  analysis  is  most  closely  related  to  studies  of  abihty  testing  used  for  military  selection.  Eitelberg 
et.  al.  (1984)  provide  a  comprehensive  history  of  ability  testing  in  the  U.S.  military  and  discuss  its 
impUcations  for  racial  composition.^  Wigdor  and  Green  (1991)  provide  the  definitive  validation  study 
of  the  Armed  Forces  Quahfication  Test  (AFQT)  as  a  predictor  of  soldiers'  in-field  performance.  Closest 
in  spirit  to  our  paper,  Angrist  (1993)  analyzes  the  impacts  of  successive  increases  in  the  military's 
AFQT  quahfication  standard  on  military  recruiting,  and  finds  that  increases  in  screening  stringency 
differentially  reduce  minority  enhstment.^ 

Om:  study  differs  from  the  existing  literature  in  several  respects.  First,  distinct  from  the  large 
literature  on  the  use  of  testing  for  military  selection  and  public  sector  job  placement,  we  study  testing 
at  competitive,  private  sector  employers.  Since  private  sector  employers  may  face  greater  pressure 
than  public  agencies  screen  to  workers  optimally,  we  view  the  private  sector  setting  as  particularly 


^See  also  the  closely  related  learning  model  by  Farber  and  Gibbons  (1996). 

See  also  Fryer  and  Levitt  (2004)  on  the  importance  of  distinctively  black  names.  Closely  related  to  these  studies, 
Giuliano  (2003)  finds  that  nonblack  managers  of  establishments  of  a  large  service  sector  firm  are  disproportionately  likely 
to  hire  nonblack  workers.  Stoll,  Raphael  and  Holzer  (2004)  find  similar  results  for  the  hiring  of  black  job  applicants  at 
firms  in  a  number  of  major  U.S.  cities.  Using  data  from  a  single  firm,  Levine,  Leonard  and  Giuliano  (2003)  find  that 
dismissals  and  quits  are  also  higher  if  managers  and  subordinates  are  not  of  the  same  race.  In  a  related  vein,  Montgomery 
(1991)  provides  a  theoretical  model  of  the  use  of  job  referrals  for  worker  selection,  and  Fernandez  and  Fernandez-Mateo 
(2004)  analyze  the  role  of  employee  referral  networks  in  connecting  applicants  to  desirable  jobs. 

The  United  States  Military's  Alpha  literacy  exam,  initiated  during  World  War  I,  probably  represents  the  first 
systematic  effort  to  screen  U.S.  workers  for  'employment.'  But  it  wasn't  until  World  War  II  that  rigorous  employment 
screening  first  confronted  the  issue  of  equality.  In  1940,  when  the  Army  began  screening  draftees  for  the  "ability  to  read 
and  write  English  at  the  fourth  grade  level,"  Southern  Congressmen  pressured  the  military  to  relax  standards.  Because 
Southern  blacks  failed  the  literacy  test  in  large  numbers,  a  disproportionate  share  of  Southern  whites  was  inducted 
(Eitelberg  et  al.,  1984).  Prior  to  1940,  the  standard  had  been  "ability  to  comprehend  simple  orders  in  the  English 
language." 

A  key  contrast  between  Angrist's  study  and  our  own  lies  in  the  way  that  testing  changes  the  hiring  environment. 
In  Angrist  (1993),  the  experimental  variation  comes  from  changes  in  screening  stringency.  In  our  study,  the  variation 
comes  from  changes  in  screening  precision  with  stringency  roughly  held  constant.  This  difference  allows  us  to  analyze 
how  improvements  in  the  employer's  information  set  affect  minority  and  non-minority  hiring. 


interesting.  Second,  whereas  almost  all  prior  work  evaluates  the  effect  of  race  on  hiring  in  a  static 
employment  setting  -  that  is,  one  where  screening  policies  are  fixed  -  the  rollout  of  testing  at  the  1, 363 
stores  in  our  sample  provides  a  unique  opportunity  to  analyze  testing  changes  hiring  in  a  previously 
informal  hiring  environment.  A  final  unusual  feature  of  our  study  is  that  we  are  able  to  extend  the 
analysis  beyond  the  hiring  phase  to  evaluate  impacts  on  the  productivity  of  hires,  as  measured  by 
turnover  and  firing  for  cause.  As  we  show  below,  these  two  outcomes  -  hiring  and  productivity  — 
are  closely  linked  theoretically  and  hence  provide  complementary  evidence  on  the  consequences  of  job 
testing  for  employee  selection. 

The  next  section  describes  our  data  and  details  the  hiring  procediires  at  the  firm  under  study 
before  and  after  the  introduction  of  testing.  Section  (3)  offers  a  model  to  illustrate  how  the  potential 
disparate  impacts  of  employment  testing  on  minority  hiring  and  productivity  depend  on  pre-testing 
hiring  practices.  Sections  (4)  and  (5)  provide  our  empirical  analysis  of  the  consequences  of  testing  for 
productivity  and  hiring.  Section  (6)  concludes. 

2      Informal  and  test-based  applicant  screening  at  a  service  sector 
firm 

We  analyze  the  apphcation,  hiring,  and  employment  outcome  data  of  a  large,  geographically  dispersed 
service  sector  firm  with  outlets  in  47  continental  U.S.  states.  Our  data  includes  all  1, 363  outlets  of  this 
firm  operating  during  our  sample  period.  All  sites  are  company-owned,  each  employing  approximately 
10  to  20  workers  in  line  positions,  and  offering  near-identical  products  and  services.  Line  positions 
account  for  approximately  75  percent  of  total  (non-headquarters)  employment,  and  a  much  larger 
share  of  hiring.  Line  job  responsibilities  include  checkout,  inventory,  stocking,  and  general  customer 
assistance.  These  tasks  are  comparable  at  each  store,  and  most  line  workers  perform  all  of  them.  Line 
workers  are  primarily  young,  ages  18  -  30,  and  many  hold  their  jobs  for  short  durations.  As  is  showTi 
in  the  first  panel  of  Table  1,  70  percent  of  line  workers  are  white,  18  percent  are  black,  and  12  percent 
are  Hispanic.  Median  tenure  of  line  workers  is  99  days,  and  mean  tenure  is  174  days  (panel  B).-^'' 

Worker  screening 

Prior  to  June  1999,  hiring  procedures  at  this  firm  were  informal,  as  is  typical  for  this  industry  and 
job  type.  Workers  applied  for  jobs  by  completing  brief,  paper  job  application  forms,  available  from 
store  employees.  If  the  store  had  an  opening  or  a  potential  hiring  need,  the  lead  store  manager  would 
typically  phone  the  applicant  for  a  job  interview  and  make  a  hiring  decision  shortly  thereafter.  On 
some  occasions,  applicants  were  interviewed  and  hired  at  the  time  of  application. 


'  Means  exclude  incomplete  employment  spells.  Over  98  percent  of  the  spells  in  our  data  are  complete. 


Commencing  in  June  1999,  the  firm  began  rolling  out  electronic  application  kiosks  provided  by 
Unicru,  Incorporated  in  all  of  its  stores.  By  Jime  of  2000,  all  1,363  stores  in  our  sample  were  equipped 
with  the  technology.  This  technology  supplanted  the  paper  application  process.  At  the  kiosk,  appli- 
cants complete  a  questionnaire  administered  by  a  screen-phone  or  computer  terminal,  or  in  a  minority 
of  cases,  by  a  web  application.  Like  the  paper  application  form,  the  electronic  questionnaire  gathers 
basic  demographic  information  such  as  age,  gender,  race,  education,  and  prior  experience.  In  addition, 
applicants  sign  a  release  authorizing  a  criminal  background  check  and  a  search  of  records  in  commercial 
retail  offender  databases. 

A  major  component  of  the  electronic  application  process  is  a  computer-administered  personality 
test,  which  has  100  items  and  takes  approximately  20  minutes  to  complete.  This  test  measures  five 
personality  attributes  that  collectively  constitute  the  'Five  Factor'  model:  conscientiousness,  agree- 
ableness,  extroversion,  openness  and  neuroticism.  These  factors  are  widely  viewed  by  psychologists  as 
core  personality  traits  (Digman,  1990;  Wiggins,  1996).  The  particular  test  instrument  used  by  this 
firm  focuses  on  three  of  the  five  traits  -  conscientiousness,  agreeableness  and  extroversion  -  which 
have  been  foimd  by  a  large  industrial  psychology  literature  to  be  effective  predictors  of  worker  pro- 
ductivity, training  proficiency,  and  tenure  (Barrick  and  Mount,  1991;  Tett,  Jackson,  and  Rothstein, 
1991;  Goodstein  and  Lanyon,  1999). 

Once  the  electronic  application  is  completed,  the  data  are  sent  to  the  vendor  of  the  electronic 
application  system,  Unicru  Incorporated,  for  automated  processing.  Unicru's  computers  transmit  the 
results  of  processing  (typically  within  a  few  minutes)  to  the  store's  manager  by  web-posting,  email 
or  fax.  Two  types  of  output  are  provided.  One  is  a  docmnent  summarizing  the  applicant's  contact 
information,  demographics,  employment  history  and  work  availability.  This  is  roughly  a  facsimile  of  the 
conventional  paper  application  form.  Second  is  a  'Hiring  Report'  that  recommends  specific  interview 
questions  and  highlights  potential  problem  areas  with  the  application,  such  as  criminal  background  or 
seK-reported  prior  drug  test  failure.  Of  greatest  interest,  the  report  provides  the  applicant's  computed 
customer  service  test  score  percentile,  along  with  a  color  code  denoting  the  following  score  ranges: 
lowest  quartile  ('red'),  second-to-lowest  quartile  ('yellow'),  and  two  highest  quartiles  ('green'). -^^ 

Following  the  employment  testing,  hiring  proceeds  largely  as  before.  Store  managers  choose 
whether  to  offer  an  interview  (sometimes  before  the  applicant  has  left  the  store)  and,  ultimately, 
whether  to  offer  a  job.  Managers  are  strongly  discouraged  from  hiring  'red'  applicants,  and,  as  is 
shown  in  Table  2,  fewer  than  1  percent  of  all  'red'  applicants  are  hired.  Beyond  this  near-prohibition, 
managers  retain  considerable  discretion.    There  are  many  more  applicants  than  jobs,  and  only  8.9 


An  identical  paper  and  pencil  personality  test  could  readily  have  been  used  in  the  pre-electronic  application  hiring 
regime.  Administering  and  scoring  this  test  manually  would  have  been  time-consuming,  however. 


percent  of  applicants  are  hired:  approximately  1  in  11.  Even  for  those  who  score  well  above  the  'red' 
threshold,  the  customer  service  test  score  has  substantial  predictive  power  for  hiring.  As  shown  in 
panel  C  of  Table  2,  hiring  rates  are  strongly  monotonically  increasing  in  the  test  score.  Only  1  in  18 
of  those  scoring  in  the  fourth  decile  (in  the  'yellow'  range)  is  hired,  relative  to  1  in  5  applicants  scoring 
in  the  highest  decile. 

Hiring  and  termination  data 

Our  analysis  draws  on  company  personnel  records  that  contain  worker  demographics  (gender,  race), 
hire  date,  and  (if  relevant)  termination  date  and  termination  reason  for  each  worker  hired  during  the 
sample  frame.  These  data  allow  us  to  calculate  length  of  service  for  employment  spells  in  our  sample, 
98  percent  of  which  are  completed  by  the  close  of  the  sample.  We  code  worker  terminations  into 
two  groups:  neutral  terminations  and  terminations  for  cause.  Neutral  terminations  include  return 
to  school,  geographic  relocation,  or  any  separation  that  is  initiated  by  the  worker  except  for  job 
abandonment.  Firings  for  cause  include  incidents  of  theft,  insubordination,  unreliability,  unacceptable 
performance  or  job  abandonment.  In  addition,  we  utihze  data  on  applicant's  self-reported  gender,  race 
(white,  black,  Hispanic,  other),  and  the  zip  code  of  the  store  to  which  they  applied  for  employment. 
We  merge  these  zip  codes  to  data  from  the  2000  U.S.  Census  of  Populations  Summary  Files  1  and 
3  (U.S.  Census  Bureau,  2001  and  2003)  to  obtain  information  on  the  racial  composition  and  median 
household  income  in  each  store's  location. 

An  important  feature  of  our  analysis  is  that  personnel  (but  not  application)  records  are  available 

for  workers  hired  prior  to  implementation  of  the  Unicru  system  at  each  store.  Hence,  we  build  a  sample 

that  includes  aU  line  workers  hired  from  January  1999,  five  months  prior  to  the  first  Unicru  rollout, 

through  May  2000,  when  all  stores  had  gone  online.  After  dropping  observations  in  which  applicants 

had  incompletely  reported  gender  or  race,  we  were  left  with  34,247  workers  hired  into  line  positions, 

25,820  of  whom  were  hired  without  use  of  testing  and  8,427  of  whom  were  hired  after  receiving  the 
test. 12 

Notably  absent  from  our  data  are  standard  human  capital  variables  such  as  age,  education  and 
earnings.  Because  most  line  workers  at  this  firm  are  relatively  yoimg  and  many  have  not  yet  completed 
schooling,  we  are  not  particularly  concerned  about  the  absence  of  demographic  variables.  The  omission 
of  wage  data  is  potentially  a  greater  concern.  Our  understanding,  however,  is  that  wages  for  Une  jobs 
are  largely  set  centrally,  and  the  majority  of  these  positions  pay  the  minimum  wage.  We  therefore 
suspect  that  controlling  for  year  and  month  of  hire,  as  is  done  in  all  models,  should  purge  much  of  the 
wage  variation  in  the  data. 


We  closed  the  sample  at  the  point  when  all  hires  were  made  through  the  Unicru  system.  Because  the  rollout 
accelerated  very  rapidly  in  the  final  three  of  twelve  months,  the  majority  of  hires  during  the  rollout  period  were  non- 
tested  hires.  Twenty-five  percent  of  the  hires  in  our  sample  were  made  prior  to  the  first  rollout. 


Applicant  test  scores 

To  analyze  test  score  differences  in  our  sample,  we  draw  on  a  database  containing  all  applications 
(214, 688  total)  submitted  to  the  1, 363  stores  in  our  sample  during  the  one  year  following  the  rollout  of 
job  testing  (June  2000  through  May  2001).  Although  we  would  ideally  analyze  applications  submitted 
during  the  rollout,  these  records  were  not  retained.  In  Appendix  2,  we  demonstrate  that  applicant  test 
scores  from  this  database  are  highly  correlated  with  the  productivity  of  workers  hired  at  each  store 
before  and  after  the  introduction  of  employment  testing  (see  also  Appendix  Table  2).  This  suggests 
that  the  applicant  sample  provides  a  reasonable  characterization  of  workers  applying  for  work  during 
the  roUout  period. 

As  shown  in  Table  2,  there  are  marked  differences  in  the  distribution  of  test  scores  among  white, 
black  and  Hispanic  applicants.  Mean  black  and  Hispanic  test  scores  are,  respectively,  5.4  points  and 
3.5  points  below  the  mean  score  of  whites.  Kernel  density  comparisons  of  standardized  raw  test  scores, 
shown  in  Figure  1,  also  underscore  the  pervasiveness  of  these  differences.  Relative  to  the  white  test 
score  distribution,  the  black  and  Hispanic  test  score  densities  are  visibly  left-shifted.  These  racial 
gaps,  equal  to  0.19  and  0.12  of  standard  deviations,  accord  closely  with  the  representative  test  data 
reported  by  Goldberg  et  al.  (1998).^^  As  we  show  below,  these  test  score  gaps  are  also  economically 
significant. 

Before  beginning  our  empirical  analysis  of  these  outcomes,  we  provide  a  brief  conceptual  model  to 
explore  the  conditions  under  which  disparate  impacts  are  likely  to  occur. 

3     When  does  job  testing  have  disparate  impacts? 

How  does  the  introduction  of  job  testing  affect  the  employment  opportunities  of  minority  job  seekers  in 
a  competitive  labor  market?  As  discussed  in  the  Introduction,  the  presumed  answer  to  this  question  is 
that  testing  reduces  the  labor  market  opportunities  of  members  of  low  scoring  groups.  Here,  we  present 
a  brief,  illustrative  model  to  explore  when  this  presumption  is  likely  to  hold.  Our  conceptual  framework 
is  closely  related  to  well  known  models  of  statistical  discrimination  by  Phelps  (1972),  Aigner  and  Cain 


Goldberg  et  al,  (1998),  using  a  representative  sample  of  the  U.S.  workforce,  find  that  conditional  on  age,  education 
and  gender,  blacks  and  Hispanics  score,  respectively,  —0.22  and  —0.18  standard  deviations  below  whites  on  the  Consci- 
entious trait.  Blacks  also  score  lower  on  Extroversion  and  Hispanics  lower  on  Agreeableness  (in  both  cases  significant), 
but  these  discrepancies  are  smaller  in  magnitude. 

We  explored  the  robustness  of  these  unconditional  comparisons  by  regressing  applicant  test  scores  (in  percentiles) 
on  dummy  variables  for  race  and  gender,  month  x  year  of  application,  and  store  fixed  effects.  Conditional  on  gender 
and  month-year  of  application,  black  applicants  score  5.5  percentiles  below  white  applicants  (i  =  24).  For  Hispanics, 
this  gap  is  3.6  percentiles  (f  =  14).  When  store  fixed  effects  are  added,  the  race  coefficients  decHne  in  magnitude  by 
about  30  percent  and  remain  highly  significant,  indicating  that  minority  applicants  are  overrepresented  at  stores  where 
white  applicants  have  below  average  scores.  We  also  find  that,  conditional  on  race  and  store-effects,  applicants  from  high 
minority  and  low-income  zip  codes  have  significantly  lower  test  scores  than  others. 


(1977),  Lundberg  and  Startz  (1984),  Coate  and  Loury  (1993)  and  Altonji  and  Pierret  (2001).  The 
contribution  of  our  model  is  to  analyze  how  an  improvement  in  the  employer's  information  set  -  that 
is,  a  rise  in  screening  precision  -  affects  the  employment  opportunities  and  productivity  (conditional 
on  hire)  of  minority  and  non- minority  workers.-'^ 

Consider  a  large  set  of  firms  facing  job  applications  from  two  identifiable  demographic  groups  x  E 
{a,  b}  that  differ  only  in  mean  productivity.^^  For  simplicity,  we  assume  that  a  and  b  applicants  each 
comprise  half  of  the  population.  Applicants  have  productivity  77^,  which  is  distributed  77  ~  iV  (tj^.,  cr^) 
with  o"^  >  0,  identical  for  a  and  b,  and  fj^^  >  7)^.  We  can  write  rj  =  f}^  +  e:^.  Firms  in  om:  model  have 
linear,  constant  returns  to  scale  production  technology,  a  positive  discount  rate,  and  are  risk  neutral. 
Workers  produce  output  /  (t^J  =  rji,  in  flow  terms,  which  is  priced  at  unity.  Job  spell  durations  are 
independent  of  ry  and  wages  are  fixed  at  w  <  77^,  ?)(,  (also  in  flow  terms). ^'' 

Firms  in  our  model  do  not  observe  the  productivity  of  individual  applicants,  77^.  Instead,  they 
observe  group  membership,  Xj  G  {a,  b),  and  a  noisy  productivity  signal,  770^,  with  773^  =  rj^  +  eq  where 
eo  ~  A''(0,CTq)  with  Uq  >  0.  We  think  of  tjq  as  representing  observable  appUcant  attributes,  such  as 
attitude,  dress  and  speech,  that  will  not  be  measmed  by  our  data.  Job  testing  in  oiu"  model  provides 
firms  with  a  second  productivity  signal,  77;^,  which  is  unbiased  and  is  independent  of  773  conditional  on 
77.  In  particular,  i]^  =  r]^+  e\  where  £1  ■^  A''(0,  cr\)  with  a\>  Q  and  E  (eo^i)  =  0. 

Firms  in  om:  model  employ  one  worker  at  a  time  and  search  for  a  replacement  when  a  vacancy  opens. 
While  holding  a  vacancy,  firms  receive  applications  drawn  at  random  from  the  pooled  distribution  of 
a  and  b  workers.  Firms  can  choose  either  to  hire  the  current  applicant  or  to  wait  a  non-zero  interval 
for  a  new  applicant.  In  this  case,  the  prior  applicant  becomes  unavailable.  Since  wages  are  fixed, 
firms  strictly  prefer  to  employ  workers  with  higher  77.  However,  because  holding  a  vacancy  forfeits 
potential  profits,  firms  will  apply  a  selection  policy  that  trades  off  the  costs  and  benefits  of  waiting 
for  a  superior  apphcant.  As  is  well  understood,  this  trade-off  leads  to  a  threshold  rule:  firms  hire 
applicants  whose  expected  productivity  exceeds  an  optimally  chosen  value,  and  a  constant  fraction 
of  worker-firm  matches  lead  to  hire.  We  analyze  a  reduced  form  version  of  this  setup.  Firms  in  our 
model  select  applicants  using  a  hiring  threshold,  and  this  produces  a  constant  hire  rate  oi  K  >  Q}^ 


A  recent  paper  by  Masters  (2004)  provides  a  theoretical  analysis  of  the  impact  of  culturally-biased  testing  on  the 
welfare  of  minority  workers.  Within  a  search  framework.  Masters  finds  that  a  test  that  is  less  accurate  for  minority 
than  non-minority  applicants  has  the  potential  to  reduce  the  welfare  of  minority  applicants  and  lower  social  welfare  in 
aggregate. 

'  Our  assumption  that  the  applicant  groups  differ  only  in  mean  productivity  is  similar  to  Coate  and  Loury  (1993).  Many 
authors  also  consider  models  of  statistical  discrimination  in  which  testing  is  differentially  informative  or  uninformative 
for  the  minority  group  (e.g.,  Aigner  and  Cain  (1977),  Lundberg  and  Startz  (1984),  Masters  (2004)).  The  evidence  in 
Hartigan  and  Wigdor  (1989),  Wigdor  and  Green  (1991)  and  Jencks  and  Philips  (1989,  chapter  2)  suggests  that  tests 
commonly  used  for  employee  selection  are  equally  predictive  of  job  performance  for  minorities  and  non-minorities. 

As  above,  the  majority  of  line  workers  at  the  establishments  we  study  are  paid  the  minimum  wage. 

To  reduce  the  number  of  cases  considered,  we  also  assume  that  K  <  1/2.  As  above,  fewer  than  1  in  10  applicants  at 


In  a  complete  model,  this  hiring  threshold  would  depend  on  technology  and  labor  market  conditions. 
In  our  reduced  form  model,  the  unconditional  hiring  probability  is  held  constant  at  Pr  {H)  =  K.  This 
simplification  focuses  our  analysis  on  the  first-order  impacts  of  job  testing  on  the  distribution  of  hiring 
across  applicant  types  {a,  b},  leaving  total  employment  fixed. -^^ 

The  question  we  analyze  is:  does  job  testing  have  a  disparate  impact  on  the  hiring  rates  and 
productivity  (conditional  on  hire)  of  a  versus  b  workers?  As  we  demonstrate,  the  answer  depends 
on  how  firms  screen  applicants  in  the  absence  of  testing.  To  highhght  the  importance  of  screening 
practices,  we  consider  three  polar  cases  that  span  the  potential  uses  of  available  applicant  information. 
The  first  is  unsystematic  selection.  Here,  firms  do  not  act  upon  -  or,  equivalently,  do  not  observe  - 
applicant  productivity  information  (that  is,  7?q  and  x).  The  second  practice  is  what  we  term  'naive' 
selection.  In  this  case,  firms  select  workers  using  the  error-ridden  productivity  signal,  tjq,  but  do  not 
adjust  for  (or  do  not  observe)  the  additional  information  conveyed  by  the  applicant's  demographic 
group  (x).  In  the  third  case,  firms  statistically  discriminate  by  combining  information  from  both  rj^ 
and  X  to  form  'rational  expectations'  for  worker  productivity.^*' 

To  provide  a  metric  for  disparate  impact,  let  i/'  =  Pr  {H\b)  —  Pr  {H\a)  equal  the  expected  difference 
in  the  hiring  rate  of  a  and  b  applicants,  and  let  n  =  E{ri\H,b)  —  E{i]\H,a)  equal  the  expected 
productivity  difference  between  a  and  b  hires.  We  say  that  job  testing  has  a  disparate  impact  if  it 
systematically  alters  ip  or  tt,  that  is  if  Aip  7^  0  or  Att  7^  0. 

3.1     Unsystematic  selection 

We  begin  with  unsystematic  selection.  Because  all  productivity  information  is  ignored  in  this  case, 
firms  hire  a  representative  subset  of  all  applicants,  each  with  probability  K.  (The  notion  of  a  screening 
'threshold'  does  not  apply  here.)  Though  the  unsystematic  selection  scenario  is  likely  unrealistic,  it 
provides  a  useful  baseline  case  because  it  corresponds  to  the  setting  primarily  considered  by  the 
literature  on  the  impact  of  testing  on  minority  employment  (e.g.,  Hartigan  and  Wigdor,  1989,  and 
cites  therein). ^^ 

Under  unsystematic  selection,  a  and  b  apphcants  face  equal  probability  of  hire,  ip^  =  0.  The 
expected  productivity  gap  between  a  and  b  hires  is  therefore  equal  to  the  difference  in  population 
means:  tTu  =  fji^  -  fj^^  <  0. 


the  stores  in  our  sample  are  hired. 

Endogenizing  K  in  our  model  would  require  many  additional  assumptions  that  would  detract  from  the  simple  points 
we  wish  to  underscore. 

Note  that  U.S.  employment  law  does  not  permit  use  of  protected  group  membership  (i.e.,  race,  sex,  age  over  40, 
disability,  or  union  status)  as  an  indicator  of  productivity.  Statistical  discrimination  is  probably  difficult  to  detect, 
however,  and  so  may  be  commonplace  in  practice. 

Note  that  we  do  not  need  to  assume  that  firms  hire  unsystematically  along  all  dimensions;  only  that  any  systematic 
selection  is  uncorrelated  with  r]^  (and,  by  implication,  with  a  and  b). 
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We  now  consider  the  introduction  of  job  testing  in  the  unsystematic  selection  environment.  Job 
testing  provides  firms  with  an  informative  productivity  signal,  t^^,  for  each  applicant. ■^^  Per  our  earlier 
assumption,  firms  will  apply  a  selection  threshold  to  the  test  score,  and  workers  with  a  value  of  r]^ 
exceeding  the  threshold  will  be  hired.  Let  «;„  be  the  selection  threshold  that  solves: 

K     =     -  [Pi  {rii  >  Ku\x  =  a) +PT{rii  >  Ku\x  =  b)] 


^  I  7i(^a-'^u)\  ^  ^  [li  iVb  -  f^-u) 


1 
2 

where  $  (■)  is  the  cumulative  distribution  function  of  the  standard  normal  distribution,  (jyi  =  (cr^  +  ^i) 
is  the  standard  deviation  of  the  test  score,  r]^,  and  7^  =  CFr]l(^v\  measures  the  precision  of  the  test, 
expressed  on  the  unit  interval.^^  Since  $  (•)  is  continuous,  bounded  between  0  and  1,  and  declining  in 
Ku,  this  equation  will  have  a  unique  solution  for  Ku- 

The  consequences  of  testing  for  hiring  and  productivity  are  simmiarized  in  the  foUov«ng  three 
propositions: 

Proposition  1    Testing  has  a  disparate  negative  impact  on  h  hiring  (Ai/'^  <  0). 

Because  the  screening  threshold,  Ku,  is  identical  for  both  applicant  groups  and  average  applicant 
productivity  is  higher  for  a  than  b  applicants,  Pr  {H\x  =  a)  must  decline  relative  to  Pr  {H\x  =  b). 
Hence,  Aip^  <  0.  Intuitively,  relative  to  a  basehne  of  unsystematic  hiring,  testing  must  reduce  hiring 
from  the  less  qualified  group. 

Proposition  2    Testing  raises  productivity  of  both  a  and  b  hires 

The  expected  productivity  of  hired  workers  from  each  group  x  is: 

E  (r]\x,  771  >  Ku)  =  Tj^  +  E  {e^\x, rj^  >  k„)  =  r?^  +  7i0^r,A  (  — — "^ —  j  ,  (1) 

where  A  (.)  is  the  Inverse  Mills  Ratio,  equal  to  ^  (•)  /  (1  —  $  (•))  >  0.  This  expression  decomposes  the 
productivity  of  hires  into  two  components.  The  first,  77^,  is  the  expected  productivity  of  a  randomly 
hired  applicant  from  group  x.  The  second  term  7icr^A  (•)  reflects  the  improvement  in  selection  due  to 
testing.  By  truncating  the  lower  tail  of  test-takers  (those  with  77 j  <  «;„),  testing  increases  the  expected 
productivity  of  hires  relative  to  applicants.  This  improvement  is  rising  in  the  precision  of  the  test, 
7^,  and  in  the  stringency  of  the  threshold  (ku)-  Since  A  (2)  >  0  for  z  G  (—00,00],  testing  raises  the 
expected  productivity  of  hires  from  each  group. 

However,  this  selection  effect  is  not  neutral  for  a  versus  b  productivity. 


We  continue  to  assume  that  other  productivity  information  {rjQ,x)  is  ignored. 
"  See  Prendergast  (1999)  for  a  detailed  development  of  the  normal  selection  equations  used  here.   Derivations  of  all 
equations  in  the  text  are  available  from  the  authors. 
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Proposition  3    Testing  reduces  the  productivity  gap  between  b  and  a  hires  (A7r„  >  0). 

Differentiation  of  equation  (1)  yields:  dE  {r]\x,  rji  >  «„)  /dfj^  =  —A'  (7^  (k^  —  77^)  /an)  <  0.  Since 
77(j  >  77j  and  A'(-),A"(.)  >  0,  this  derivative  indicates  that  the  productivity  gains  from  testing  are 
larger  for  b  than  a  hires.  By  truncating  a  relatively  larger  share  of  the  b  distribution,  testing  differen- 
tially raises  the  quality  of  hires  from  this  group. 

In  brief,  introduction  of  job  testing  in  an  unsystematic  hiring  environment  raises  the  productivity 
of  hires  from  both  groups,  reduces  the  hiring  of  b  relative  to  a  applicants,  and  raises  the  productivity  of 
b  relative  to  a  hires.  Because  testing  'systematizes'  an  unsystematic  hiring  environment,  these  effects 
are  of  first  order  importance.  This  may  be  seen  in  Panel  A  of  Figure  2,  which  plots  a  simulation 
of  the  impact  of  testing  on  the  b  —  a  hiring  and  productivity  gaps  {Aip^,  AtTu).  Testing  yields  a 
discontinuous  jump  in  the  relative  hiring  of  a's  and  in  the  relative  productivity  of  b's.  The  more 
precise  is  the  test  (that  is,  the  larger  is  7j),  the  greater  is  the  disparate  impact.  Thus,  consistent  with 
the  large  literature  on  testing  and  race,  improved  candidate  selection  in  this  case  comes  at  a  cost  of 
reduced  opport\inity  for  groups  with  lower  average  test  scores. 

3.2     Naive  selection 

But  this  result  is  not  general;  the  discontinuous  change  in  relative  hiring  and  productivity  is  explained 
by  the  fact  that,  prior  to  testing,  selection  is  no  better  than  random.  We  now  consider  a  potentially 
more  realistic  setting  in  which  firms  apply  a  uniform  selection  criterion  that  is  blind  to  demographic 
characteristics.  Here,  firms  'discriminate'  on  the  basis  of  the  productivity  information  contained  in 
rjQ,  but  they  do  not  use  demographics,  x,  to  condition  their  expectations  (i.e.,  they  do  not  statistically 
discriminate). 

We  assume  that  firms  assess  expected  applicant  productivity  as  E  {ri\rjQ)  =  rjQ.  We  refer  to  this 
selection  rule  as  'naive'  because  a  and  b  applicants  with  identical  signals  (tjq)  are  treated  identically 
although  they  do  not  have  comparable  expected  productivity.  (Hence,  this  is  not  a  rational  expec- 
tations equilibrium.)^^  This  case  does,  however,  roughly  comport  with  what  U.S.  employment  law 
demands,  which  is  that  employers  not  use  protected  group  membership  (in  this  case,  race,  represented 
by  x)  as  an  indicator  of  productivity.  We  therefore  consider  this  a  useful  example. 


Parameter  values  used  for  this  simulation  are:  Ur,  =  !,??„  =  0.50,  ry^  =  0.25  and  K  —  0.40.  The  figure  plots  E  (A-^u) 
and  E  {AtTu)  for  test  precision  values  ranging  from  7^  =  0.5  to  7^  =  1.0. 

'Naive'  firms  in  our  model  take  i)  at  face  value;  they  do  not  use  information  about  aggregate  or  group  means  to 
assess  productivity.  A  'quasi-naive'  alternative  would  be  for  firms  to  calculate  E  ('7|»7o)  =  Pr  (i  =  a|'?o)  ■  E  {T]\rjQ,x  =  a)  + 
Pr  (x  =  6|j?o)  •  E{T]\T]g,x  =  b).  That  is,  they  attempt  to  infer  demographic  group  membership,  i,  without  using  the 
demographic  indicator.  This  alternative  complicates  the  analysis  but  does  not  change  the  fundamental  results. 
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Let  Kn  (70)  be  the  naive  selection  threshold  that  solves: 

K=[^  (70  {{rja  -  «n  (70)))  /'^n)  +  ^  (7o  ((%  -  «n  (70)))  /^)]  /2, 

where  cr^o  =   (cr^  +  ctq)        and  7q  =  (Jt^/ct^q.  We  denote  «„  (70)  as  exphcitly  depending  upon  70 

because  a  change  in  screening  precision  holding  K  constant  implies  a  change  in  k„,  as  we  show  below. 

Under  naive  selection  (and  prior  to  introduction  of  testing)  the  hiring  rate  from  each  demographic 

group,  X,  is  Pr  {H\x)  =  $  (70  {fj^  —  k„  (70))  /o"?,).  And  the  expected  productivity  of  hires  is:  E{ri\riQ  > 

«n,  X)  =  fi^+  foCTriX  (70  i^n  (lo)  -  Vx)  l^n)  ■ 

Substituting  these  into  the  measures  of  relative  hiring  and  productivity  gives: 

^^  ^  ^  /'7o(^n(7o)-%)A  _  ^  |^7o(«»(7o)-^a)\  ^  ^2) 


and 


(%  -  'Ha)  +  70Cr>7 


^  (  7o(^n(7o)-^b)'\  _  ^  /^7o  («^n  (7o)  -  »?«) 


(3) 


We  can  now  analyze  how  job  testing  alters  ?/»„  and  7r„  in  the  naive  hiring  environment. 

Testing  provides  firms  with  a  second  apphcant  productivity  signal,  t]-^.  Since  both  productivity 
signals  (77Q  are  r\^  are  informative,  firms  will  optimally  combine  them  to  assess  apphcant  productivity. 
We  continue  to  assimie  that  naive  firms  do  not  use  applicant  demographics  (x)  to  form  expectations.^^ 
Hence,  the  best  estimate  of  t\  given  {tjq,  ^1}  is  a  weighted  average  of  the  two  signals,  where  the  weights 
are  inversely  proportional  to  the  error  variance  of  the  signals:  £'(7/|77o,77i)  —  rjQ  [a^/  (170  + Ci)]  + 
7ii  [cTg/  ((Tq  +  cy1)Y  It  can  be  shown  that  the  additional  information  provided  by  rji  is  identically  equal 
to  a  rise  in  signal  precision  from  70  =  [<t^/  (ct^  +  CTq)]        to: 


This  is  equivalent  to  the  population  R  statistic  (i.e.,  vi?^)  from  a  regression  (for  either  demographic 
group)  of  T]  on  r?o,  ili  and  a  constant. 

This  identity  is  useful  because  it  allows  us  to  assess  the  consequences  of  testing  by  analyzing  how 
a  rise  in  screening  precision  (from  70  to  72)  impacts  ip^  and  7r„.  Our  answers  are  summarized  in  the 
following  four  propositions: 

Proposition  4  Testing  causes  a  downward  adjustment  to  the  screening  threshold:  5«;„  (7)  /dj  <  0. 


More  precisely,  they  use  no  information  about  population  means  to  assess  expected  productivity. 
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Holding  Ku  fixed,  the  hiring  odds  for  each  appHcant  group  are  dechning  in  screening  precision 
under  naive  selection:  dFT:{H\X)/d'y  =  0'  (■)  {{rjx  —  Kn)  /o",,)  <  0.  As  testing  reduces  measurement 
error,  it  reduces  the  fraction  of  workers  whose  assessed  productivity  exceeds  a  given  threshold. ^^  To 
maintain  overall  hiring  at  K,  Kn  must  therefore  decline:  (9k„  (7)  /d'y  <  0. 

Proposition  5    Testing  raises  hiring  of  a.  relative  to  h  applicants  {^ipn  <  0). 

Holding  Kn  constant,  greater  precision  reduces  hiring  of  both  a  and  b  applicants.  This  reduction 
differentially  affects  b's  since,  given  their  lower  mean,  they  benefit  disproportionately  from  measure- 
ment error  in  the  pre-testing  enviromnent.  The  rise  in  the  screening  threshold  more  than  fully  offsets 
the  reduction  in  hiring  for  a's  but  only  partly  offsets  the  loss  for  b's,  thereby  raising  hiring  of  a's  at 
the  expense  of  b's.  (Due  to  the  non-linearity  of  $(■),  these  offsetting  effects  cannot  'wash  out'  for 
both  groups.)  Hence,  testing  has  a  disparate  negative  impact  on  b  hiring:  Aip^  <  0.  See  Appendix  1. 

Proposition  6    Testing  raises  the  productivity  of  both  a  and  b  hires. 

Testing  raises  the  odds  that  a  qualified  applicant  is  hired  and  that  an  unqualified  appUcant  is 
rejected;  the  expected  productivity  of  hires  therefore  rises.  For  a  applicants,  this  gain  in  productivity 
is  partly  offset  by  a  rise  in  their  aggregate  hiring  rate.  The  net  gain  for  both  a's  and  b's  is,  however, 
positive.  See  Appendix  1. 

Proposition  7   Testing  raises  the  productivity  of  h  relative  to  a  hires  (tt^  >  0) . 

Because  selectivity  of  b's  has  increased  while  selectivity  of  a's  has  declined,  the  productivity  gap 
between  them  contracts:  Att^  >  0.  See  Appendix  1. 

In  brief,  testing  in  the  naive  envirorunent  generates  disparate  impacts  comparable  in  sign  to  the 
unsystematic  selection  case.  However,  the  change  in  screening  induced  by  a  rise  in  precision  is  small 
relative  to  the  change  caused  by  a  movement  from  unsystematic  to  systematic  selection:  lA-^^l  > 
jA-^^l  and  |A7r„|  >  |A7r„|.  This  may  be  seen  in  Figure  2  by  comparing  Panels  A  and  B,  which  plot 
the  disparate  impacts  of  testing  in  the  unsystematic  and  naive  selection  cases  respectively.^*  The 
magnitude  of  the  disparate  impacts  are  large  and  discontinuous  in  the  unsystematic  case.  They  are 
comparatively  small  (and  continuous)  in  the  naive  case. 


'Recall  that  we  have  assumed  that  K  >  rj<j,r;^   and  hence  k„  is  above  the  mean  of  the  applicant  distribution. 
Parameter  values  used  are  the  same  as  in  Panel  A.  Here,  we  show  the  hiring  and  productivity  gap  impacts  of  a  rise 
in  precision,  starting  from  a  baseline  of  j^  =  0.5. 


14 


3.3      Statistical  discrimination 

Because  t]q  is  a  an  error-ridden  measure  of  applicant  productivity,  firms  can  improve  screening  precision 
by  using  demograpliic  group  membership  as  an  additional  productivity  signal,  i.e.,  by  statistically 
discriminating.^^  The  imphcations  of  statistical  discrimination  for  disparate  impacts  are  quite  different 
from  the  cases  above. 

The  linearity  of  the  conditional  expectation  of  the  standard  normal  distribution  implies  that  sta- 
tistically discriminating  firms  will  assess  apphcant  productivity  as  E{t]\x,  ryo)  =  Jy^  +  7o  (^o  ~  Vx)  -This 
expression  is  equal  to  a  convex  combination  of  the  group  specific  mean,  77^,  and  the  observed  applicant 
signal,  rjQ,  where  the  weight  given  to  the  signal  is  increasing  in  signal  precision,  79. 

Using  the  threshold  hiring  rule,  firms  will  therefore  hire  applicants  for  whom  E{'ri\x,  t;q)  >  Kg.  This 
expression  implies  that  statistical  discrimination  equates  the  expected  productivity  of  the  marginal  hire 
from  each  group.  In  contrast  to  the  naive  selection  case,  applicants  with  identical  expected  productivity 
-  rather  than  identical  scores  -  are  treated  identically. 

Under  statistical  discrimination,  the  hiring  gap  between  a  and  b  applicants  is: 


with  productivity  gap: 


TTs  =  (lb  -  Va)  +  70O-'7 


^  I    ^^(lo)  -Vb\   _  ^f'^^  (To)  -  Va 


(6) 


These  terms  (V's  and  tt^)  differ  from  the  selection  terms  for  the  naive  case  {ipn  and  7r„)  by  only 
one  parameter:  the  selectivity  term,  79,  which  appears  in  the  denominator  of  the  selection  equations 
above,  (5)  and  (6),  and  in  the  numerator  of  the  selection  equations  for  firms  using  naive  selection,  (2) 
and  (3).  This  contrast  reflects  a  difference  in  how  firms  use  available  screening  information.  Firms 
using  statistical  discrimination  discount  high  and  low  values  of  tjq  towards  the  group  specific  mean  in 
proportion  to  measurement  error.  Lower  precision  therefore  raises  selectivity,  seen  in  a  reduction  in  the 
denominator  of  the  selection  equations.  By  contrast,  naive  firms  make  no  adjustment  for  measurement 
error  in  observed  applicant  signals.  Consequently,  lower  precision  (more  measurement  error)  reduces 
selectivity,  seen  in  a  reduction  in  the  numerator  of  the  selection  equations. 

As  in  the  naive  case  above,  job  testing  in  the  statistical  discrimination  setting  is  identically  equal 
to  a  rise  in  screening  precision  from  70  to  72  (see  equation  (4)).  The  impacts  of  job  testing  on  hiring 
and  productivity  are: 


"^U.S.  employment  law  does  not  permit  use  of  protected  group  membership  (i.e.,  race,  sex,  age  over  40,  disability, 
or  union  status)  as  an  indicator  of  productivity.  Statistical  discrimination  is  therefore  illegal.  In  pratice,  it  is  probably 
difficult  to  detect,  however,  and  so  may  potentially  be  commonplace.  List  (2004)  presents  evidence  from  a  field  experiment 
that  sellers  of  sportscard  statistically  discriminate  against  minority  buyers. 
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Proposition  8   Testing  causes  an  upward  adjustment  in  the  screening  threshold:  Oks  (7)  /d^  >  0. 

Opposite  to  the  naive  selection  case,  hiring  odds  for  all  applicants  are  rising  in  screening  precision, 
holding  Ks  constant.  Hence,  k^  must  rise  to  maintain  overall  hiring  at  K:  Oks  (7)  /d^  >  0. 

Proposition  9   Testing  raises  hiring  ofh  relative  to  a  applicants  (Ai/i^  >  0). 

Statistically  discriminating  firms  'discount'  applicant  signals  towards  their  group-specific  means 
in  proportion  to  measurement  error. ^'^  This  practice  differentially  reduces  b  hiring  since  more  weight 
is  placed  on  the  (lower)  b  group  mean  and  less  weight  on  the  observed  signal.  Testing,  by  reducing 
measurement  error,  raises  b  hiring.  See  Appendix  1. 

Proposition  10   Testing  raises  the  productivity  of  both  a  and  b  hires. 

Testing  raises  the  odds  that  a  qualified  applicant  is  hired  and  that  an  unqualified  appUcant  is 
rejected;  the  expected  productivity  of  hires  rises.  For  b  applicants,  this  gain  in  productivity  is  partly 
offset  by  a  rise  in  their  aggregate  hiring  rate.  The  net  gain  for  both  a's  and  b's  is  positive.  See 
Appendix  1. 

Proposition  11    Testing  increases  the  productivity  gap  between  a  and  b  hires,  but  this  effect  is  very 
small  (AtTs  w  0). 

As  noted  above,  statistical  discrimination  equates  the  productivity  of  marginal  hires  from  each 
group  (a  and  b)  at  all  levels  of  test  precision  (provided  that  both  groups  are  hired).  Consequently,  any 
effect  of  testing  on  relative  a  versus  b  productivity  (tTj)  can  only  arise  from  changes  in  the  conditional 
mean  gap  between  inframarginal  a  and  b  hires.  Under  the  assumed  normality  of  a  and  6  productivity 
distributions,  these  inframarginal  differences  are  of  second  order  importance,  as  we  show  formally  in 
Appendix  1.  That  is,  they  arise  only  from  the  second  derivative  of  the  selection  equation  (A"(-)), 
which  is  quite  close  to  zero.  The  net  effect  of  testing  on  relative  productivity  is  weakly  negative: 
AtTs  <  O.In  practice,  this  effect  is,  to  a  first  approximation,  zero:  Attj  w  0. 

In  summary,  statistical  discrimination  has  two  substantive  implications  that  differ  from  prior  cases 
considered.  First,  testing  does  not  lower  -  and  may  raise  -  the  hiring  of  members  of  low  scoring  groups. 
Second,  testing  has  essentially  no  disparate  impact  on  the  productivity  of  low  relative  to  high  scoring 
groups.  These  results  are  illustrated  in  Panel  C  of  Figure  2,  which  plots  the  change  in  the  a  —  b  hiring 
and  productivity  gaps  induced  by  testing.  In  this  simulation,  testing's  impact  on  the  productivity  gap 
is  essentially  undetectable. 


In  the  absence  of  measurement  error,  a  and  b  applicants  are  treated  identically  conditional  on  their  scores. 
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3.4     The  productivity  impacts  of  testing  under  a  hiring  quota 

As  Coate  and  Loury  (1993)  discuss,  a  legal  requirement  that  employers  treat  minority  and  non-minority 
applicants  identically  conditional  on  their  test  scores  is  likely  to  be  difEcult  to  enforce.  A  more  realistic 
scenario  may  be  one  in  which  employers  are  required  to  maintain  a  constant  hiring  rate  of  minority  and 
non-minority  workers,  i.e.,  a  quota.''^  Here,  we  briefly  explore  how  use  of  quotas  affects  our  results. 

A  hiring  quota  is  readily  incorporated  in  our  framework  as  a  constraint  that  dip/d')  =  0;  that  is,  a 
change  in  screening  precision  must  leave  the  hiring  gap  between  a  and  b  applicants  unaffected.  Under 
the  maintained  assiunption  that  the  overall  hiring  rate  is  K^  firms  will  optimally  respond  to  the  quota 
by  setting  a  separate  hiring  threshold  for  each  apphcant  group  k^  (7) .  This  threshold  will  select  the 
most  qualified  applicants  from  each  group  subject  to  the  constraint  that  a  constant  share  of  apphcants 
from  each  group  is  hired.  Since,  by  construction,  testing  does  not  impact  relative  hiring  rates  under 
the  quota,  the  relevant  question  is  how  testing  affects  relative  productivity. 

First,  consider  the  case  of  naive  selection.  It  is  straightforward  to  show  that  a  rise  in  screening 
precision  under  the  hiring  quota  imphes  that  the  hiring  threshold,  k^  (7),  must  fall  by  relatively  more 
for  h  than  a  applicants.  Intuitively,  because  screening  error  is  differentially  beneficial  to  h  group 
appUcants,  a  reduction  in  screening  error  requires  a  compensating  decUne  in  the  h  group  threshold  to 
maintain  constant  hiring.  Some  algebra  shows  that  the  group-specific  change  in  the  hiring  threshold 
is  given  by  dK^jd^  =  {fj^  —  k^  (7))  /-y,  which  is  larger  in  absolute  magnitude  (more  negative)  for  b 
than  a  apphcants. ^^  Using  this  derivative,  we  obtain 

dnn 
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^  /7(4(7)-^i,)\  _  ^  fl  {<{!)-%) 


>0. 


This  expression  is  unambiguously  positive:  increased  precision  under  the  quota  raises  the  productivity 
of  b  relative  to  a  hires.  In  fact,  this  result  is  comparable  to  that  for  the  non-quota  hiring  case.  In  the 
non-quota  case,  however,  the  change  in  the  productivity  differential  is  augmented  by  the  dechning  in  b 
hiring  (raising  selectivity  further).  In  the  quota  case,  this  second  effect  is  absent,  but  the  same  result 
holds  with  smaller  magnitude:  testing  raises  the  relative  productivity  of  b  hires. 

Now,  consider  the  case  of  statistical  discrimination.  Opposite  to  the  case  above,  a  rise  in  screening 
precision  under  quota-constrained  statistical  discrimination  requires  that  the  hiring  threshold  rise  by 
more  for  b  than  a  applicants,  with  Ok^/O"/  =  (kJ  (7)  —  fj^)  /'y?^  Applying  this  result  to  the  relative 


Indeed,  the  political  debate  over  the  U.S.  Civil  Rights  Act  of  1991  focused  on  wrhether  it  implicitly  required  employers 
to  use  racial  hiring  quotas.  See  Donohue  and  Siegelman  (1991)  for  a  rigorous  analysis  of  the  hiring  incentives  created  by 
the  Civil  Rights  Act  of  1964. 

"We  assume  the  quota  is  imposed  from  a  starting  (non-quota)  case  where  /c°  (7)  =  k!^  (7).    This  guarantees  that 
>^n  (7)  -  %  >  <  (7)  -  ??„. 

''^We  again  assume  the  quota  is  imposed  from  a  starting  (non-quota)  case  where  kJ  (7)  =  k^  (7).  This  guarantees  that 
^^(7)  -^6  >  <(7)  -na- 
il 


^  (    >^Ul)-Vb\    _^f'^s(l)-  Va 


productivity  equation  yields, 

^^■s  _  \    I    --s  \  1/         -10    I    _   \    [    -'SKI/        -la    \       -^  Q 

Ai/i3=0 

which  is  also  positive;  the  use  of  testing  with  the  quota  again  raises  relative  productivity  of  b  hires.  This 
result  contrasts  to  the  pure  statistical  discrimination  case  where  we  found  that  relative  productivity 
effects  are,  to  a  first  approximation,  zero.  The  logic  of  this  result  is  that  a  hiring  quota  negates  the 
key  feature  of  statistical  discrimination,  which  is  that  firms  equate  the  productivity  of  marginal  hires 
from  each  group.  Because  the  hiring  threshold  rises  differentially  for  b  hires  to  maintain  the  quota, 
the  relative  productivity  of  marginal  b  versus  a  hires  rises,  generating  disparate  productivity  impacts. 
Thus,  the  case  of  pure  {not  quota-constrained)  statistical  discrimination  presents  one  feature  not 
shared  by  other  cases:  testing  leaves  the  productivity  gap  between  workers  from  different  demographic 
groups  unchanged. 

3.5      Implications 

Only  one  unambiguous  conclusion  emerges  from  our  model:  testing  raises  productivity.  By  contrast, 
the  widely  held  presumption  that  testing  reduces  hiring  of  applicants  from  low  scoring  groups  is  not 
supported.  If  firms  statistically  discriminate,  a  gain  in  screening  precision  may  hiring  from  low  scoring 
groups.  If  firms  do  not  statistically  discriminate,  a  gain  in  precision  may  slightly  reduce  hiring  from 
minority  groups.  Large  disparate  impacts  on  hiring  are  only  assured  in  the  extreme  case  where  hiring  in 
the  pre-test  environment  is  entirely  un correlated  with  the  test  measure  (e.g.,  the  case  of  unsystematic 
selection) . 

As  oxir  model  underscores,  the  view  that  testing  has  disparate  impacts  on  minority  hiring  has 
an  often-overlooked  dual  implication  for  productivity.  If  hiring  policies  do  not  account  for  group 
productivity  differences,  testing  typically  raises  the  relative  productivity  of  minority  hires  even  as  it 
reduces  minority  hiring.  By  contrast,  if  firms  account  for  group  differences  (i.e.,  they  statistically 
discriminate) ,  productivity  gains  are  likely  to  be  relatively  uniform  among  minority  and  non-minority 
hires.  This  distinction  provides  a  useful  point  of  leverage  for  empirically  distinguishing  these  cases. 

Though  our  model  makes  many  specific  assumptions,  three  resxilts  appear  quite  general.  The  first, 
as  above,  is  that  improved  selection  raises  productivity  of  hires.  The  second  is  that,  outside  the  ex- 
treme case  of  unsystematic  hiring,  the  expected  disparate  impacts  of  testing  on  hiring  and  productivity 
of  minority  workers  are  typically  ambiguous;  improved  precision  has  no  intrinsic  impUcations  for  the 
relative  well-being  of  different  worker  groups.  The  third  is  that  testing  under  statistical  discrimina- 
tion will  not  generally  induce  disparate  productivity  impacts.  Because  the  marginal  productivity  of 
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hires  across  different  worker  groups  is  equated,  testing,  to  a  first  approximation,  may  leave  relative 
productivity  unaffected. 

We  now  turn  to  an  empirical  assessment  of  the  effects  of  testing  on  the  employment  and  produc- 
tivity of  minority  and  non-minority  workers  at  the  approximately  1,400  sites  of  the  firm  described 
above. 

4     Estimating  the  productivity  consequences  of  job  testing 

We  begin  our  empirical  analysis  by  studying  the  productivity  consequences  of  job  testing.  As  an  initial 
productivity  measure,  we  analyze  the  length  of  completed  job  spell  durations  of  workers  hired  with 
and  without  use  of  job  testing.  We  think  of  job  spell  duration  as  a  proxy  for  rehability;  uru'eliable 
workers  are  likely  to  quit  unexpectedly  or  be  fired  for  poor  performance.^^  In  section  (4.2),  we  also 
consider  a  second  productivity  measure:  firing  for  cause. 

We  initially  estimate  the  following  difference-in-difference  model  for  job  spell  duration: 

Dijt  =  a  +  XijtPi  +  /32Tijt  +  9t  +  ^j  +  eijt.  (7) 

In  this  equation,  the  dependent  variable  is  the  job  spell  duration  (in  days)  of  worker  i  hired  at  site  j  in 
year  and  month  t.  The  vector  X  contains  worker  race  and  gender,  and  T  is  an  indicator  variable  equal 
to  1  if  the  worker  was  screened  via  job  testing,  and  0  otherwise.  The  vector  9  contains  a  complete  set 
of  month  x  year-of-hire  effects  to  control  for  seasonal  and  macroeconomic  factors  affecting  turnover. 
Most  specifications  also  include  a  complete  set  of  store  site  effects,  <f,  which  absorb  fixed  factors 
affecting  job  duration  at  each  store.  Since  outcomes  may  be  correlated  among  workers  at  a  given  site, 
we  use  Huber- White  robust  standard  errors  clustered  on  store  and  application  method  (T  =  {0, 1})-^^ 
Estimates  are  found  in  Table  3.  The  first  estimate  excludes  both  site  effects  and  the  T  indicator 
variable.  Consistent  with  the  bivariate  comparisons  in  Table  1,  black  and  Hispanic  workers  have 
substantially  lower  conditional  mean  tenure  than  white  employees.  When  1,363  site  fixed  effects 
are  added  in  column  2,  these  race  differences  fall  by  approximately  40  percent  (though  they  remain 
highly  significant),  indicating  that  minority  workers  are  overrepresented  at  establishments  where  both 
minorities  and  non-minorities  have  high  turnover. 


To  be  clear,  one  can  readily  construct  cases  where  disparate  impacts  occur  (in  either  direction)  by  assuming  large 
cross-group  dissimilarities  between  the  productivity  distributions  of  inframarginal  hires.  But  these  cases  are  likely  to  be 
somewhat  artificial. 

^'Stores  of  this  firm  are  typically  staffed  leanly,  with  2  to  4  line  workers  per  shift.  Unreliable  workers  and  those  who 
quit  unexpectedly  inconvenience  customers  by  reducing  staff  availability  and  impose  costs  on  managers  and  coworkers 
who  must  cover  their  shifts. 

^^Ninety-eight  percent  of  employment  spells  that  commenced  during  the  sample  window  of  January  1999  to  May  2000 
were  completed  by  the  last  observation  date  in  our  personnel  data  (August  2003).  We  exclude  incomplete  spells  from 
these  OLS  models. 
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Columns  3  and  4  present  initial  estimates  of  the  impact  of  testing  on  job  spell  duration.  In  column 

3,  which  excludes  site  effects  and  race  dummies,  we  find  that  the  employment  spells  of  tested  hires  are 
8.8  days  longer  than  those  of  non-tested  hires  {t  =  2.0).  When  site  fixed  effects  are  added  in  column 

4,  the  point  estimate  rises  to  18.8  days  (t  =  4.6).^'^  Adding  controls  for  worker  race  and  gender  has 
little  impact  on  the  magnitude  or  significance  of  the  job-test  effect.  When  we  include  state  x  time 
interactions  in  column  6  to  account  for  differential  employment  trends  by  state,  the  job-test  point 
estimate  rises  slightly  to  22.1  days. 

In  net,  these  models  suggest  that  testing  increased  mean  job  duration  by  approximately  20  days,  or 
12  percent. ^^  This  pattern  is  also  clearly  visible  in  Figure  3,  which  plots  the  distribution  of  completed 
job  spells  of  tested  and  non-tested  hires.  The  distribution  of  speUs  for  tested  hires  hes  noticeably  to 
the  right  of  that  for  non-tested  hires,  and  generally  has  greater  mass  at  higher  job  durations  and  lower 
mass  at  shorter  durations. 

Instrumental  variables  estimates 

Oui  estimates  could  be  biased  if  job-test  status  is  endogenous.  This  endogeneity  might  take  two 
forms.  A  first  concern  is  that  we  observe  in  our  data  that  in  the  1  to  2  months  following  the  rollout 
of  testing  at  a  site,  10  to  25  percent  of  new  hires  are  not  tested.  There  are  three  reasons  why  this 
may  occur.  First,  individuals  who  apply  prior  to  the  advent  of  testing  are  often  not  on  the  payroll  for 
several  weeks;  they  will  appear  as  non-tested,  post-testing  hires  in  our  data.  Second,  operational  and 
training  issues  in  the  weeks  following  the  Unicru  installation  may  cause  the  online  application  system 
to  be  unavailable  or  unused.  Third,  managers  might  deliberately  circmnvent  testing  to  hire  preferred 
candidates. ^^ 

To  purge  the  possible  endogeneity  of  tested  status  among  hires  at  a  store  using  the  test,  we 
re-estimate  equation  (7)  using  a  dummy  variable  indicating  store-test-adoption  as  an  instrumental 
variable  for  the  tested  status  of  all  applicants  at  the  store.  Since  we  do  not  know  the  exact  installation 
date  of  the  electronic  appUcation  kiosk  at  a  store,  we  use  the  date  of  the  first  observed  tested  hire  to 
proxy  for  the  rollout  date.  First  stage  estimates  of  this  equation  are  found  in  Appendix  Table  1.  The 
coefficient  on  the  store-adoption  dummy  in  the  first  stage  equation  of  0.89  {t  =  111)  indicates  that 
once  a  store  has  adopted  testing,  the  vast  majority  of  subsequent  hires  are  tested. 

Instrumental  variables  estimates  of  the  effect  of  testing  on  job  spell  durations  in  panel  B  of  Table 


The  flow  of  hires  in  our  sample  intrinsically  overrepresents  workers  hired  at  high-turnover  stores  (relative  to  the  stock 
of  hires).  Hence,  when  testing  is  introduced,  a  disproptionate  share  of  tested  hires  are  at  high  turnover  establishments. 
Adding  site  effects  to  the  model  controls  for  this  source  of  composition  bias,  which  substantially  raises  the  point  estimate 
for  the  job  testing  variable  (compare  columns  3  and  4). 

Models  that  include  a  full  set  of  state  x  month-year-of-hire  interactions  (17  x  47  dummies)  yield  nearly  identical 
(and  quite  precise)  point  estimates. 

Changes  to  the  Unicru  system  implemented  after  the  close  of  our  sample  window  effectively  barred  such  overrides. 
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3  are  approximately  80  percent  as  large  the  OLS  estimates  and  are  nearly  as  precisely  estimated.  In 
fact,  we  cannot  reject  the  hypothesis  that  IV  and  OLS  estimates  are  identical.  This  suggests  that  the 
potential  endogeneity  of  tested  status  within  stores  is  not  a  substantial  source  of  bias.*" 

A  second  source  of  concern  is  that  a  store's  use  of  testing  may  be  correlated  with  potential  out- 
comes. Although  all  stores  in  our  sample  adopt  testing  during  our  sample,  the  timing  of  adoption  is 
not  necessarily  entirely  random.  To  the  best  of  our  understanding,  the  rollout  order  of  stores  was  de- 
termined by  geography,  technical  infrastructure,  and  internal  personnel  decisions.  It  is  this  last  factor 
that  is  of  concern.  If,  for  example,  stores  adopted  testing  when  they  experienced  a  rise  in  turnover, 
mean  reversion  in  the  length  of  employment  spells  could  cause  us  to  overestimate  the  causal  effect  of 
testing  on  workers'  job  spell  durations. 

As  a  check  on  this  possibihty,  we  augmented  equation  (7)  for  job  spell  duration  with  leads  and  lags 
of  test  adoption.  These  models,  fomid  in  Appendix  Table  2,  estimate  the  trend  in  job  speU  durations 
for  workers  hired  at  each  store  in  the  9  months  surrounding  introduction  of  testing:  5  months  prior 
to  4  months  post  adoption.  If  job  spell  durations  rose  or  fell  significantly  prior  to  test  adoption,  the 
lead  and  lag  models  would  make  this  evident. 

As  shown  in  the  appendix  table,  the  lead  estimates  are  in  no  case  significant  and,  moreover,  do 
not  have  consistent  signs.  By  contrast,  the  lag  (post-rollout)  dummies  show  striking  evidence  of  a 
discontinuous  rise  in  job  duration  for  workers  hired  immediately  after  testing  was  adopted.  Workers 
hired  in  the  first  month  of  testing  have  14  days  above  average  duration;  workers  hired  in  subsequent 
months  have  19  to  28  days  above  average  duration  (in  all  cases  significant).  These  results  indicate 
that  our  main  estimates  above  are  not  confounded  by  pre-existing  trends  in  job  spell  duration.^^ 

Quantile  regressions 

Since  employment  duration  data  are  typically  right-skewed,  our  results  could  also  be  driven  in 
part  by  outhers.  As  a  check  on  this  possibility.  Panel  A  of  Table  4  presents  quantile  (least  absolute 
deviation)  regression  models  for  job  dm-ation.  In  these  models,  we  retain  the  2  percent  of  observations 
in  which  the  job  spell  had  yet  to  be  completed  by  the  end  of  the  sample  (August  2003).  Since  it  is 
not  feasible  to  estimate  a  large  number  of  store  fixed  effects  in  quantile  regression  models,  we  instead 
include  46  state  dummies. 

The  regression  estimates  for  median  job  spell  duration  confirm  that  testing  increased  the  length 


"*  The  fact  that  IV  point  estimates  are  smaller  than  OLS  estimates  implies  that  non-tested  hires  at  stores  using  testing 
had  below  average  job  duration  relative  to  other  non-tested  hires.  This  is  consistent  with  some  managerial  subversion. 

^'As  an  additional  robustness  test,  we  estimated  a  version  of  equation  (7)  augmented  with  separate  test-adoption 
dummies  for  each  cohort  of  adopting  stores,  where  a  cohort  is  defined  by  the  month  and  year  of  adoption.  These 
estimates  find  a  positive  effect  of  testing  on  job  spell  duration  for  9  of  12  adopter  cohorts,  6  of  which  are  significant  at 
p  <  0.05.  By  contrast,  none  of  the  3  negative  point  estimates  is  close  to  significant. A  table  of  estimates  is  available  from 
the  authors. 
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of  job  spells.  In  the  models  in  panel  A,  we  find  that  testing  increased  median  tenure  by  8  to  9  days, 
which  is  roughly  a  10  percent  increase  (see  Table  1),  comparable  in  effect  size  to  the  OLS  models. 
Panel  B  provides  estimates  for  job  spell  length  at  percentiles  10,  25,  50,  75,  and  90."*^  The  impact 
of  testing  on  completed  tenure  is  statistically  significant  and  monotonically  increasing  in  magnitude 
from  the  10*'*  to  the  75"*  percentiles.  We  find  no  effect  at  the  90*'*  percentile.  In  net,  these  results 
provide  robust  evidence  that  job  testing  raised  worker  tenure. 

4.1      Did  testing  have  a  disparate  impact  on  productivity? 

Tables  1  reveals  that,  prior  to  the  use  of  job  testing,  Hispanic  and  especially  black  workers  had 
substantially  shorter  mean  job  durations  than  whites.  Job  testing  could  potentially  affect  this  gap. 
As  our  model  indicates,  unless  firms  were  statistically  discriminating  in  the  pre-testing  regime,  an 
increase  in  screening  precision  is  predicted  to  differentially  raise  the  productivity  of  minority  relative 
to  non-minority  hires  (a  disparate  impact).  We  analyze  here  whether  this  occurred.  Before  doing  so, 
we  calculate  an  upper  bound  on  the  plausible  magnitude  of  this  impact. 

Consider  a  hypothetical  case  where,  prior  to  testing,  screening  was  uncorrected  with  the  test  score. 
We  refer  to  this  as  the  'unsystematic  selection  benchmark. '^^  Panel  A  of  Table  2  shows  that  among 
tested  applicants,  the  black-white  test  score  gap  was  5.4  points  (47.7  versus  53.1  points).  Under  the 
'luisystematic  selection'  benchmark,  we  assume  that  this  gap  would  have  carried  over  into  the  hired 
sample  in  its  entirety.  By  contrast.  Panel  B  of  Table  2  shows  that  among  tested  hires,  the  black- white 
test  score  gap  was  only  1.5  points.  Hence,  relative  to  the  benchmark,  hiring  using  the  test  reduced  the 
black-white  test  score  gap  among  hires  by  3.9  points.  The  analogous  figure  for  Hispanic  hires  is  2.9 
points.  These  gains  (3.9  and  2.9  points)  place  an  upper  bound  on  the  degree  to  which  testing  could 
plausibly  have  compressed  the  minority /non-minority  test  score  gap  among  hires. 

To  translate  this  point  difference  into  a  productivity  difference,  we  use  the  job  applicant  database 
summarized  in  Table  2  to  estimate  the  relationship  between  applicant  test  scores  and  job  spell  dura- 
tions. As  noted,  test  scores  are  not  available  for  applications  submitted  to  the  stores  in  our  sample 
prior  to  the  use  of  testing.  In  their  place,  we  use  job  applications  submitted  in  the  year  after  the 
rollout  of  employment  testing  (June  2000  through  May  2001).  Assuming  that  applicant  characteristics 
did  not  change  systematically  after  testing  was  initiated,  these  data  provide  a  rough  measrue  of  the 
average  characteristics  of  stores'  applicants  in  the  period  prior  to  testing.  (Supporting  evidence  for 
this  assumption  is  given  in  Appendix  2.) 


We  exclude  incomplete  spells  since  some  are  at  very  high  percentiles. 

While  this  case  is  unlikely,  the  evidence  above  that  testing  significantly  raised  productivity  indicates  that  the  initial 


screen  could  not  have  been  perfectly  correlated  with  the  test. 
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Column  1  of  Table  5  provides  an  estimate  of  the  following  regression  model  for  job  spell  durations: 

Dijt  =  a  +  XijtPi  +  PiSj  +  et  +  eijt.  (8) 

In  this  equation,  the  dependent  variable  is  the  completed  job  spell  duration  of  workers  hired  at  store 
j  prior  to  the  use  of  testing,  and  Sj  is  the  average  test  score  of  store  j's  applicants.  Control  variables 
for  gender,  race,  year-month  of  hire  and  state  are  also  included.  Our  expectation  is  that  ^^  >  0:  stores 
that  had  higher  quahty  appUcants  (as  measured  by  the  test  score)  should  have  had  longer  mean  job 
spell  durations  prior  to  the  use  of  testing. ^^ 

This  expectation  is  confirmed  in  Table  5.  The  coefhcient  of  2.73  {t  —  5.0)  on  the  mean  test  score 
variable  indicates  that,  conditional  on  race,  gender,  time  and  state  effects,  stores  facing  applicant  pools 
with  below  average  mean  test  scores  had  significantly  shorter  job  spells:  a  one  point  lower  mean  test 
score  is  associated  with  approximately  3  fewer  days  mean  job  duration  for  workers  hired  prior  to  the 
use  of  testing.  The  economic  magnitude  of  this  relationship  is  large.  A  one-standard  deviation  (3.7 
point)  difference  in  average  store-level  test  scores  predicts  a  10  day  difference  in  mean  job  duration. ^^ 

We  can  calculate  an  upper  bound  on  expected  disparate  productivity  impacts  of  testing  by  using 
this  regression  estimate.  Under  the  unsystematic  selection  benchmark,  we  calculated  that  testing  could 
potentially  have  closed  the  black-white  test  gap  by  3.9  points.  Scaling  by  $^,  this  implies  a  potential 
11  days  narrowing  of  the  job  duration  gap  between  black  and  white  hires.  This  is  a  sizable  effect,  equal 
to  one  third  of  the  initial  gap  of  33  days  (Table  5,  column  1).  An  analogous  calculation  for  Hispanic 
hires  yields  a  potential  disparate  impact  of  8  days  on  a  baseline  of  7  days,  i.e.,  full  convergence.  Hence, 
under  the  null  of  tmsystematic  selection,  job  testing  had  the  potential  to  substantially  raise  the  tenure 
of  minority  relative  to  non-minority  hires. 

To  assess  whether  this  occurred,  we  estimate  in  Table  6  a  set  of  job  spell  duration  models  performed 
separately  by  race.  These  estimates  provide  remarkably  little  evidence  of  disparate  impacts.  The  point 
estimate  for  the  effect  of  testing  on  mean  job  duration  is  20  days  for  whites,  23  days  for  blacks,  19  days 
for  males  and  20  days  for  females.  All  are  significant.  Only  for  Hispanic  hires  (the  smallest  sub-group 
in  our  sample)  is  the  point  estimate  of  differing  magnitude:  8  days,  and  insignificant.^^ 

The  second  panel  of  Table  6  presents  analogous  IV  models  for  job  spell  duration  by  race  where 

tested  status  is  instrumented  with  a  dummy  variable  indicating  the  store  has  adopted  job  testing. 

As  with  earher  models,  the  instrumental  variables  point  estimates  are  about  80  percent  as  large  as 

We  do  not  estimate  equation  (8)  for  job  spell  durations  of  tested  hires  since  selection  on  the  test  score  would  be 
expected  to  attenuate  estimates  of  P^.  As  shown  in  Appendix  1,  this  relationship  is  positive  in  the  tested  sample  {$^  >  0). 
But,  as  expected,  it  is  substantially  attenuated  relative  to  the  non-tested  sample. 

■"^As  shown  in  Appendix  Table  3,  this  relationship  is  also  robust  to  inclusion  of  other  demographic  and  regional  controls, 
including  log  median  income  and  minority  resident  share  in  the  store  zip  code. 

■*  A  potential  explanation  for  why  the  gains  were  smaller  for  Hispanic  hires  than  other  groups  is  that  the  test  was 
initially  only  offered  in  English. 
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comparable  OLS  estimates  and  are  only  slightly  less  precisely  estim.ated.  In  this  case,  gains  in  job 
duration  for  whites  are  estimated  to  be  slightly  larger  than  for  blacks.  In  summary,  these  results 
provide  little  evidence  that  testing  had  a  disparate  impact  on  the  productivity  of  minority  relative  to 
non-minority  hires. ''^ 

4.2     A  second  productivity  measure:  Firing  for  cause 

To  supplement  the  job  duration  evidence  above,  we  explore  a  second  dimension  of  worker  productivity: 
firing  for  cause.  Using  Hnked  personnel  records,  we  distinguish  terminations  for  cause  -  theft,  job 
abandoimient,  insubordination  -  from  neutral  or  positive  terminations,  such  as  return  to  school, 
relocation,  or  new  employment.  To  provide  an  outcome  meastue  that  is  uniformly  defined  across 
workers  at  different  points  in  their  employment  spells,  we  measure  employment  status  at  180  days 
following  hire.  We  code  three  mutually  exclusive  categories:  employed,  neutral  termination,  and 
terminated  for  cause. ^^  As  shown  in  the  first  panel  of  Figure  4,  two-thirds  of  job  spells  have  ended  at 
180  days  following  hire,  and  22  percent  of  spells  have  resulted  in  termination  for  cause. 

To  compare  termination  outcomes  of  tested  and  non-tested  workers,  we  estimate  the  following 
linear  probability  model  for  employment  status  at  180  days: 

E  [1  {O]^  =  k]]=a  +  Xi,tl3l  +  ptT^it  +  O'l  +  ^],  (9) 

where  1{-}  is  the  indicator  function  and  k  corresponds  to  each  of  the  three  potential  employment 
outcomes  (O):  employed,  neutral  termination,  and  termination  for  cause.  So  that  coefficients  may  be 
read  as  percentage  points,  the  dependent  variable  is  multiplied  by  100.  The  coefficient  of  interest, 
^g,  estimates  the  conditional  mean  difference  in  the  probability  of  each  outcome  for  tested  relative  to 
non-tested  hires. 

Table  7  contains  estimates.  The  first  specification,  which  excludes  the  job  testing  dummy  variable, 
indicates  that  180  days  after  hire,  minority  workers  are  substantially  more  likely  than  non-minorities 
to  have  been  fired  for  cause.  As  with  the  racial  differences  in  mean  tenure,  these  discrepancies  are 
large.  Relative  to  whites,  black  and  Hispanic  workers  are,  respectively,  9  and  3  percentage  points  (47 
percent  and  15  percent)  more  likely  to  have  been  terminated  for  cause  within  the  first  180  days  of 
hire. 

Column  2  contrasts  employment  outcomes  of  tested  relative  to  non-tested  hires.  At  180  days 
following  hire,  tested  workers  are  4.4  percentage  points  (14  percent)  more  likely  than  are  non-tested 


'The  overall  rise  in  tenure  of  19  to  22  days  (Table  3,  columns  5  and  6)  implies  that  the  use  of  test-based  screening  was 
equivalent  to  a  rise  of  7  to  8  points  in  the  average  test  scores  of  hires.  If  the  firm  was  initially  hiring  unsystematically, 
this  rise  would  have  been  fully  21  points  (using  the  average  scores  of  hires  minus  the  average  scores  of  applicants  in  Table 
2).  Clearly,  testing  improved  screening,  but  screening  was  far  from  unsystematic  initially. 

Results  are  similar  if  we  use  120  or  240  days  instead.  Workers  terminated  for  cause  are  ineligible  for  rehire. 
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workers  to  remain  employed,  3.1  percentage  points  (6.7  percent)  less  likely  to  have  received  a  neutral 
termination,  and  1.4  percentage  points  (6.5  percent)  less  likely  to  have  been  terminated  for  cause. 
The  first  two  point  estimates  are  highly  significant;  the  third  is  marginally  significant  (t  =  1.5).  As 
shown  in  Coliunn  3,  instrumental  variables  estimates  for  these  models  (using  store  test-adoption  as  an 
instrument)  show  comparable  effects.  Hence,  tested  hires  appear  to  have  better  termination  outcomes 
across  the  board. 

The  large  racial  differences  in  termination  outcomes  evident  in  Column  1  again  underscore  that 
job  testing  has  the  potential  to  generate  disparate  impacts  by  raising  minority  relative  to  non-minority 
productivity.  We  can  benchmark  the  possible  magnitude  of  these  impacts  using  the  procedure  above. 
As  shown  in  Panel  B  of  Table  5,  stores  facing  applicant  pools  with  below  average  mean  test  scores 
had  significantly  higher  rates  of  termination  for  cause:  a  one  point  lower  mean  applicant  test  score 
was  associated  with  a  0.41  percentage  point  higher  share  of  workers  terminated  for  cause  within 
180  days.  Under  the  'unsystematic  selection'  benchmark,  we  calculate  that  the  use  of  job  testing 
would  be  expected  to  compress  the  black- white  termination-for-cause  gap  by  1.6  percentage  points 
(0.41  X  3.9),  and  the  Hispanic-white  termination-for-cause  gap  by  1.2  percentage  points  (0.41  x  2.9). 
These  reductions  are  substantial,  equal  to  18  to  40  percent  of  the  baseline  difference  in  termination 
rates. 

We  find  no  evidence  of  a  disparate  impact  of  testing  on  terminations,  however.  As  shown  in 
panel  B  of  Table  6,  the  point  estimates  imply  that  testing  reduced  termination  rates  -  both  neutral 
terminations  and  firings  for  cause  -  by  roughly  equal  amounts  for  workers  of  all  three  race  groups. 

In  net,  our  results  indicate  that  job  testing  improved  worker  selection,  leading  to  longer  job  spell 
durations  and  a  reduction  in  the  frequency  of  firing  for  cause.  Most  important  for  our  analysis, 
we  find  no  evidence  of  disparate  impacts;  productivity  gains  were  uniformly  large  for  minority  and 
non-minority  hires.  In  hght  of  our  theoretical  framework,  this  suggests  that  firms  may  have  held 
rational  expectations  in  the  pre-testing  hiring  regime  -  that  is,  they  accurately  accomited  for  expected 
productivity  differences  when  selecting  applicants.  In  this  case,  our  model  suggests  that  disparate 
impacts  on  minority  hiring  are  likely  to  be  small. 

5     The  impact  of  employment  testing  on  hiring 

5.1     Unsystematic  selection  baseline 

We  now  assess  whether  testing  had  a  disparate  impact  on  minority  hiring.  Before  doing  so,  we  bench- 
mark the  potential  magnitude  of  this  impact.  As  shown  in  Table  2,  there  are  significant  differences 
in  test  scores  among  black,  white  and  Hispanic  job  applicants.  Figure  5,  which  plots  locally  weighted 
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regressions  of  hiring  rates  on  test  scores  (conditioning  on  store  effects  and  application  year  x  month) , 
shows  that,  for  apphcants  of  all  race  groups,  the  probabihty  of  hire  is  strongly  monotonically  increas- 
ing in  the  test  score.  The  overall  hire  rate  is  8.9  percent,  but-  applicants  who  score  one  standard 
deviation  below  the  mean  have  essentially  zero  probability  of  hire,  while  those  who  score  one  standard 
deviation  above  the  mean  have  a  12  to  15  percent  probabihty  of  hire.*  The  importance  of  test  scores 
for  hiring  is  also  visible  in  Figure  6,  which  plots  the  distribution  of  test  scores  for  applicants  who  were 
subsequently  hired.  In  contrast  to  the  test  score  distributions  for  job  apphcants  shown  in  Figure  1, 
the  race  difference  in  test  scores  among  job  hires  is  negligible.  This  suggests  that  race  differences  in 
test  scores  could  have  significant  disparate  impacts  on  hiring. 

To  benchmark  these  impacts,  we  again  consider  an  unsystematic  selection  basehne.  Using  the  data 
for  white  apphcants  exclusively,  we  estimate  the  following  linear  probability  model  for  hiring: 

100 
E{Hi)=^7rnXl{Si  =  n}.  (10) 

n=l 

Here,  the  dependent  variable  is  an  indicator  equal  to  1  if  applicant  i  was  hired,  S  is  the  applicant's 
test  score  and  1  {■}  is  the  indicator  function.  The  coefficients,  7t„,  estimate  the  hire  rates  for  white 
applicants  at  each  test  score  percentile. ^'^  We  can  apply  this  coefficient  vector  to  the  test  score 
distribution  for  each  race  group  to  calculate  predicted  hiring  rates  on  the  assumption  that  firms  use 
the  same  selection  rules  for  all  applicants.  These  predicted  rates  are  10.2  percent  for  white  applicants 
(equal  to  the  white  mean  by  construction),  8.8  percent  for  black  applicants  and  9.3  for  Hispanic 
applicants. ^^  These  race  gaps  in  predicted  hiring  rates  are  sizable.  If  hiring  was  initially  uncorrelated 
with  the  test,  testing  would  cause  the  black  hire  rate  to  fall  by  2.5  percentage  points  (25  percent)  and 
the  Hispanic  hire  rate  by  1  percentage  point  (10  percent).  As  we  show  below,  disparate  impacts  of 
this  magnitude  are  detectable  in  our  sample.  We  now  assess  if  they  occurred. 


We  also  estimated  linear  probability  models  for  hiring  odds  as  a  function  of  test  score,  store  effects,  time  effects,  and 
race  and  gender.  We  estimate  that  a  one  standard  deviation  (20  point)  increase  in  the  test  score  raises  an  applicant's 
hiring  probability  by  4.6  percentage  points  (t  =  67).  Given  a  baseline  hiring  rate  of  9  percent,  this  is  a  large  effect.  A 
table  of  estimates  is  available  from  the  authors. 

When  estimating  tt,  we  also  control  for  site  effects.  This  has  little  effect  on  the  results. 

As  is  visible  in  Table  2  panel  C,  observed  hiring  rates  for  tested  black  and  liispanic  applicants  are  in  fact  lower 
than  the  predicted  rates.  This  discrepancy  is  also  suggested  by  Figure  5  where,  conditional  on  test  scores,  minority 
applicants  are  generally  less  likely  to  be  hired  than  non-minorities.  Although  this  discrepancy  could  potentially  be 
explained  by  taste-based  discrimination,  our  model  also  predicts  this  pattern.  During  job  interviews,  firms  will  observe 
applicant  characteristics  that  are  not  visible  in  our  data,  such  as  dress,  comportment,  and  maturity.  These  observables 
are  represented  by  rj  in  our  model.  Provided  that  fj  is  unbiased,  our  model  immediately  implies  that  minority  applicants 
will  have  weaker  observables  than  non-minority  applicants  conditional  on  their  test  scores:  E{fj\fi  =  k,x  =  b)  <  E{fj\fj  = 
k,x  =  a).  (A  proof  is  available  on  request.)  Hence,  our  model  implies  that  minority  applicants  will  have  a  lower  hire 
rate  than  non-minorities  conditional  on  their  scores,  which  is  what  we  observe  in  Figure  5. 
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5.2     Evidence  on  disparate  hiring  impacts 

As  shown  in  Panel  A  of  Table  1,  simple  mean  comparisons  of  minority  employment  before  and  after 
the  use  of  testing  suggest  that  job  testing  had  little  effect  on  minority  hiring.  In  fact,  the  employment 
share  of  white  workers  fell  roughly  4.5  percentage  points  in  the  year  following  the  introduction  of 
testing.  This  uncontrolled  comparison  could  potentially  mask  within-store  shifts  against  minority 
hiring,  however. 

To  rigorously  assess  the  effect  of  testing  on  racial  composition,  it  is  useful  to  derive  a  link  between 
the  hiring  rates  observed  in  the  data  and  the  underlying  parameters  of  interest,  which  is  the  effect  of 
testing  on  hiring  odds  for  minority  applicants.  The  data  allow  us  to  observe  the  race  of  new  hires,  which 
we  express  as  Pt  {B\H,A),  that  is  the  probability  that  a  new  worker  is  black  given  that  he  applied 
(A)  and  was  hired  (H).  Using  Bayes  rule,  we  can  write  the  following  identity  for  the  black/non-black 
(B/NB)  hiring  odds  ratio: 

/  Pt{B\H,A)  \  fPTiH\B,A).Pr{B\A)\  fPr{H\NB,A)-PT{NB\A)\ 

^  [priNB\H,A))  =  '^  [ Piim J  "  ^"  [ ^mA) j  '        ^''^ 

Rearranging,  we  obtain, 

/  PriB\H,A)  \  ^      (    PxH\B,A    \  _      (  P,{B\A)  \ 
\Pr{NB\H,A))  \Pi{H\NB,A))  \Pr{NB\A))'  ^  ^' 

This  equation  indicates  that  the  odds  that  a  newly  hired  worker  is  a  minority  depend  on  the  hiring 
odds  for  minority  versus  non-minority  applicants  and  the  relative  appUcation  rates  of  minorities  and 
non-minorities. 

Oiu'  empirical  question  concerns  how  testing  affects  the  hiring  odds  for  minorities.  The  second 
term  in  equation  (12)  -  the  minority  appUcation  rate  -  is  a  confoimding  variable  that  we  would  like 
to  ehminate.  The  lack  of  data  on  the  composition  of  job  applicants  prior  to  the  introduction  of 
testing  is  therefore  a  point  of  some  concern.  Although  we  have  no  evidence  suggesting  that  testing 
altered  the  racial  composition  of  apphcants,  we  also  cannot  offer  evidence  against  this  hypothesis. ^^ 
One  might  speculate,  for  example,  that  because  the  computerized  application  requires  applicants  to 
submit  a  social  security  number  and  authorize  a  criminal  background  check,  this  could  differentially 
discourage  minority  applicants. ^^  If  so,  this  would  bias  our  results  towards  finding  that  job  testing 
reduced  minority  hiring  -  which  is  not  what  we  find. 

As  an  empirical  analog  to  equation  (12),  consider  the  following  conditional  ('fixed-effects')  logit 
model: 

E  {B^jtWijt,  Aijt,  Tijt,  9t,  fj)  =F{9t  +  (/?,-  +  P^Tijt)  ,  (13) 

^'Unicru  personnel  interviewed  for  this  research  beUeve  that  apphcation  kiosks  are  enjoyable  to  use  and  hence  yield 
more  applicants. 

^''Petit  and  Western  (2004)  estimate  that,  among  men  born  between  1965  and  1969,  3  percent  of  whites  and  20  percent 
of  blacks  had  served  time  in  prison  by  their  early  thirties. 
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where  B  indicates  that  a  hired  worker  is  black,  the  vectors  ip  and  6  contain  a  complete  set  of  store 
and  month-by-year  of  hire  dimimies,  and  F  {■)  is  the  cumulative  logistic  function.  The  coefficient, 
/Sy,  measures  the  impact  of  job  testing  on  the  log  odds  that  a  newly  hired  worker  is  black.  Without 
further  assumptions,  /Sy  captm-es  the  combined  impact  of  testing  on  both  relative  application  rates 
and  hiring  odds  by  race.  If  we  assume  that  minority  application  rates  are  roughly  constant  within 
stores,  these  will  be  eliminated  by  the  store  fixed  effects,  if.  In  this  case,  ^Sy  captures  the  impact  of 
testing  on  hiring  odds  by  race,  which  is  the  parameter  of  interest. 

To  avoid  the  incidental  parameters  problem  that  arises  when  estimating  a  maximum  likehhood 
model  with  a  very  large  number  of  fixed  effects  (1,  363),  we  estimate  equation  (12)  using  a  conditional 
logit  model.  This  estimator  effectively  'conditions  out'  time-invariant  store-specific  factors,  which 
include,  by  assumption,  relative  minority/non-minority  application  rates. 

The  top  panel  of  Table  8  reports  estimates  of  equation  (13)  for  the  hiring  of  white,  black  and 
Hispanic  workers.  These  models  yield  no  evidence  that  employment  testing  affected  relative  hiring 
odds  by  race.  In  all  specifications,  the  logit  coefficient  on  the  job  testing  dummy  variable  is  small 
relative  to  its  standard  error  {z  <  1),  and  its  magnitude  is  economically  insignificant.  The  estimated 
impact  of  testing  on  the  hiring  probability  of  blacks  and  Hispanics  is  —0.3  and  —0.2  percentage  points, 
respectively.^ 

As  a  robustness  test  for  the  conditional  logit  estimates,  we  also  fit  a  simple  fixed-effects,  linear 
probability  model  of  the  form: 

E  (BijtlHijt,  Aijt)  =  a  +  p^r^jt  +  0t  +  ^,-.  (14) 

This  model  contrasts  the  share  of  hires  by  race  at  each  store  among  tested  and  non-tested  hires. 

Although  the  linear  model  is  technically  misspecified  for  this  problem,  it  may  provide  more  power  to 

detect  a  smaU  change  in  the  racial  composition  of  hires. 

Panel  B  of  Table  8  contains  estimates  of  equation  (14)  where  the  dependent  variable  is  multiplied 

by  100  so  that  coefficients  may  be  read  as  percentage  points.     In  aU  cases,  the  impact  of  testing  on 

hiring  rates  by  race  is  precisely  estimated  and  close  to  zero.  The  point  estimates  imply  that  testing 

raised  white  hire  rates  by  0.5  percentage  points  and  reduced  black  and  Hispanic  hiring  rates  by  0.2 

and  0.1  percentage  points.^^  None  of  these  effects  are  significant.  The  third  panel  of  Table  8  performs 

instrumental  variable  versions  of  these  estimates,  using  stores'  adoption  of  testing  as  an  instrument 

for  applicants'  tested  status.  These  IV  estimates  are  similar  to  the  corresponding  OLS  models. 

^"Marginal  effects  are  calculated  as  aPr  {H)  /dT  =  Pr  {H)  ■  (1  -  Pr(if)  •  Pj. 
Point  estimates  for  these  three  categories  do  not  sum  to  zero  since  there  is  a  small  number  of  'other'  race  workers  in 
the  sample. 
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Earlier,  we  calculated  that  testing  could  potentially  lower  the  hiring  rate  of  black  and  Hispanic 
applicants  by  2.5  and  1.0  percentage  points  respectively.  Table  8  strongly  suggests  that  this  did  not 
occur:  we  can  reject  disparate  impacts  of  this  magnitude  with  well  over  99  percent  confidence. 

5.3     Disparate  hiring  impacts:  A  second  test 

Since  these  results  are  central  to  our  conclusions,  we  test  their  robustness  by  analyzing  a  complemen- 
tary source  of  variation.  As  we  show  below,  there  is  a  tight  link  between  the  neighborhoods  in  which 
stores  operate  and  the  race  of  workers  that  they  hire:  stores  in  minority  and  low-income  zip  codes 
hire  a  disproportionate  share  of  minority  workers.  We  can  use  this  link  to  explore  whether  the  intro- 
duction of  testing  systematically  changed  the  relationship  between  stores'  neighborhood  demographics 
and  the  race  of  hires.  Specifically,  we  estimate  a  version  of  equation  (14)  augmented  with  measures 
of  the  minority  share  or  median  income  of  residents  in  the  store's  zip  code,  calculated  from  the  2000 
U.S.  Census.  We  first  estimate  this  model  separately  for  tested  and  non-tested  hires  at  each  store 
(excluding  site  effects)  to  assess  the  cross-sectional  relationship  between  zip  code  characteristics  and 
the  race  of  hires.  We  next  test  formally  if  job  testing  changed  this  relationship. 

Table  9  contains  estimates.  Column  1  of  the  first  panel  documents  a  close  correspondence  between 
the  race  of  neighborhood  residents  and  the  race  of  hires.  The  coefficient  of  —86.8  {t  =  38)  on  the 
non- white  residents  variable  indicates  that,  prior  to  the  use  of  testing,  a  store  situated  in  an  entirely 
non-white  zip  code  woiild  be  expected  to  have  88  percent  non-white  hires.  Column  2  shows  the 
analogous  estimate  for  tested  hires.  The  point  estimate  of  —85.6  indicates  that  the  relationship 
between  store  location  and  worker  race  was  little  changed  by  employment  testing. 

Columns  3  and  4  make  this  point  formally.  When  we  pool  tested  and  non-tested  hires  and  add 
an  interaction  between  the  test  dummy  and  the  share  of  non-white  residents  in  the  zip  code,  the 
interaction  term  is  close  to  zero  and  insignificant.  When  site  dummies  are  added  in  column  4  -  thus 
absorbing  the  main  effect  of  zip  code  share  non-white  residents  while  retaining  the  interaction  term  - 
the  interaction  term  is  again  close  to  zero.  Subsequent  columns,  which  repeat  this  exercise  for  black 
and  Hispanic  hires,  confirm  these  patterns. 

Panel  B  performs  analogous  estimates  for  the  racial  composition  of  hires  using  neighborhood 
household  income  in  place  of  zip  code  minority  share.  In  the  pre-testing  period,  stores  in  more 
affiuent  zip  codes  had  a  substantially  larger  share  of  white  employees;  10  additional  log  points  in 
neighborhood  household  income  is  associated  with  a  3.2  percentage  point  higher  share  of  white  hires. 
Employment  testing  does  not  appear  to  have  altered  this  link.  For  all  race  groups,  and  for  both 
measures  of  neighborhood  demographics,  the  pre-post  change  in  the  relationship  between  neighborhood 
characteristics  and  the  race  of  hires  is  insignificant. 
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In  net,  despite  sizable  racial  differences  in  test  scores,  we  find  no  evidence  that  job  testing  had 
disparate  racial  impacts  on  hiring  at  the  1, 363  stores  in  oui  sample.  This  evidence  concords  with  our 
earlier  finding  that  testing  did  not  differentially  raise  productivity  of  minority  hires.  As  miderscored 
by  our  model,  if  prior  to  testing,  screening  was  blind  to  the  information  revealed  by  the  test,  disparate 
impacts  on  both  hiring  and  productivity  are  hkely.  The  fact  that  neither  type  of  disparate  impact 
occurred  strongly  suggests  that  prior  to  testing,  firms  in  our  sample  had  'rational  expectations'  - 
that  is,  they  statistically  discriminated.  The  fact  that  firms  had  rational  expectations  does  not  imply, 
however,  that  the  screening  provided  by  the  test  was  redundant;  the  fact  that  productivity  rose  proves 
otherwise.  Rather,  it  suggests  that  testing  raised  productivity  by  improving  selection  within  observable 
race  groups.  Between  group  differences  -  while  sizable  -  were  already  implicitly  taken  into  accoimt 
by  the  informal  screen. 

6     Conclusion 

An  influential  body  of  research  concludes  that  the  use  of  standardized  tests  for  employment  screening 
poses  an  intrinsic  equity-efficiency  trade-off;  raising  productivity  through  better  selection  comes  at  a 
cost  of  screening  out  minority  applicants.  This  inference  rests  on  the  presumption  that  in  the  absence 
of  standardized  tests,  employers  do  not  aheady  accoimt  for  expected  productivity  differences  among 
applicants  from  different  demographic  groups.  Accordingly,  a  test  that  reveals  these  differences  will 
disproportionately  reduce  hiring  (and  improve  productivity)  of  workers  from  low-scoring  groups.  In  a 
competitive  hiring  environment,  however,  this  may  not  be  the  most  relevant  case.  If,  absent  testing, 
employers  aheady  account  for  expected  productivity  differences  among  applicant  groups,  it  is  possible 
for  employment  testing  to  improve  selection  without  adversely  affecting  equity.  The  reason  is  that 
the  gains  from  testing  may  primarily  accrue  from  selecting  better  candidates  within  applicant  groups 
rather  than  from  reducing  hiring  of  groups  with  lower  average  scores. 

We  studied  the  evidence  for  an  equality-efficiency  trade-off  in  employment  testing  at  a  large,  ge- 
ographically dispersed  retail  firm  whose  1, 363  stores  switched  over  the  course  of  12  months  from 
informal,  paper-based  hiring  to  a  computer-supported  screening  process  that  relies  heavily  on  a  stan- 
dardized personality  test.  We  found  that  the  move  to  employment  testing  increased  productivity  at 
treated  stores,  raising  mean  and  median  employee  tenure  by  10  percent,  and  shghtly  lowering  the 
frequency  of  terminations  for  cause.  Consistent  with  expectations,  minority  applicants  performed  sig- 
nificantly worse  on  the  employment  test.  Had  the  pre-testing  hiring  screen  been  'blind'  to  the  expected 
productivity  differences  revealed  by  the  test,  we  calculated  that  employment  testing  would  have  re- 
duced minority  hiring  by  approximately  10  to  25  percent.  This  did  not  occur.  We  found  no  evidence 
that  employment  testing  changed  the  racial  composition  of  hiring  at  this  firm's  1, 363  sites.  Moreover, 
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productivity  gains  were  equally  large  among  minority  and  non-minority  hires.  The  combination  of 
uniform  productivity  gains  and  no  disparate  hiring  impacts  suggests  that  employers  were  eflFectively 
statistically  discriminating  prior  to  the  introduction  of  employment  testing.  Consequently,  the  gain 
in  improved  selection  came  at  no  measurable  cost  in  equity. 

Several  caveats  apply  to  our  results.  First,  our  data  are  from  only  one  large  retailer.  Since  retail 
firms  in  the  U.S.  operate  in  a  competitive  environment,  we  might  anticipate  that  other  firms  would 
respond  similarly.  However,  analysis  of  other  cases  is  needed  before  general  conclusions  can  be  drawn. 
A  second  caveat  is  that  the  between-group  differences  found  by  the  employment  test  used  at  this 
firm  are  not  as  large  as  differences  found  on  other  standard  ability  tests,  such  as  the  Armed  Forces 
Qualification  Test.  An  alternative  employment  test  that  revealed  larger  group  productivity  differences 
might  potentially  generate  disparate  impacts.  Although  we  do  not  discount  this  possibility,  there 
are  two  reasons  to  beUeve  it  is  not  a  first  order  concern.  First,  we  generally  expect  that  employers 
will  account  for  expected  group  productivity  differences;  a  test  that  reveals  large  disparities  on  some 
measure  should  not  necessarily  generate  large  surprises.  Second,  employment  testing  guidehnes  issued 
by  the  Equal  Employment  Opportunity  Commission  make  it  difficult,  and  potentially  risky,  for  firms 
to  use  employment  tests  that  'pass'  minority  applicants  at  less  than  80  percent  of  the  pass-rate  of  non- 
minority  applicants. ^^  We  therefore  do  not  expect  typical  employment  tests  to  show  greater  group 
differences  than  those  found  here. 

A  final  caveat  for  interpretation  is  that  our  results  speak  only  to  firms'  private  gains  from  improved 
worker  selection.  The  extent  to  which  these  private  gains  translate  into  social  benefits  depends  on  the 
mechanism  by  which  testing  improves  selection.  If  testing  improves  the  quahty  of  matches  between 
workers  and  firms,  the  attendant  gains  in  allocative  efficiency  are  likely  to  raise  social  welfare.  By 
contrast,  if  testing  primarily  redistributes  'desirable'  workers  among  competing  firms  where  they  would 
have  comparable  marginal  products,  social  benefits  will  be  decidedly  smaller  than  private  benefits  (cf. 
Stightz,  1975;  Lazear,  1986;  Masters,  2004).  Moreover,  since  testing  is  itself  costly,  the  net  social 
benefits  in  the  pure  screening  case  could  well  be  negative.  Though  our  results  provide  little  guidance 
as  to  which  of  these  scenarios  is  more  relevant,  it  appears  unlikely  that  social  benefits  fr-om  testing 
exceed  the  private  benefits.  Quantifying  these  social  benefits  remains  an  important  topic  for  future 
work. 


^^This  is  referred  to  by  the  EEOC's  Uniform  Guidelines  on  Employee  Selection  Criteria  (1978)  as  the  "Four  Fifths" 
rule.  The  test  used  at  this  firm  was  evaluated  for  "Fourth  Fifths"  compliance.  Had  it  failed,  it  would  likely  have  been 
modified. 
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7     Appendix  1:  Proofs  of  selected  propositions 

Proposition  5  (Naive  selection  case)  Testing  raises  hiring  of  a  relative  to  b  applicants  {Aip,^  <  0). 
A  constant  hiring  rate  implies  that 


where  z^,  =  d-fQ  [{f]^  -  k;„  (70))  /cr^]  /d-f  =  [jj^  -  k„  (70)  +  70O  /V^.  Noting  that  cp  (■)  >  0,  the  above 
expression  imphes  that  either  Za  >  0  and  Zb  <  0,  oi  Za  <  0  and  Zh  >  0.  Since  k^  <  0  (prior 
proposition),  only  the  first  pair  of  inequalities  can  be  satisfied:  Za  >  0  and  Zf,  <  0.  Applying  these 
inequahties  to  ipn  (equation  (2))  yields: 


^70 

Hence,  an  increase  in  screening  precision  generates  a  disparate  negative  impact  on  b  hiring:  Aipn  <  0. 
Proposition  6  (Naive  selection  case)  Testing  raises  the  productivity  of  both  a  and  h  hires. 
The  expected  productivity  of  hires  at  firms  using  naive  selection  is 


The  impact  of  testing  on  the  productivity  of  hires  is: 

dE{r]\H,x)  /^7(«n(7)-??x)^    ,  ^/  /'7K  (7)  - '7x)^  /" «^n  (7)  -  ^x  +  7^n  (7) ^         n^;^ 

As  shown  in  the  text:  7^'„  (7)  <  0,  k„  (7)  -  r?6  +  7??^  (7)  >  0,  k„  (7)  -  ??„  +  77^  (7)  <  0.  Noting  that 
^  (■)  I  '^'  (')  >  0;  equation  (15)  is  positive  for  b  hires.  Hence,  b  productivity  rises.  To  show  that  equation 
(15)  is  also  positive  for  a  hires,  we  use  the  fact  that  777^  (7)  >  fj^  —  Kn  (7),  and  substitute  into  equation 
(15): 


dE{r^\H,a)     ^ 

\fli'^n{l)-Va)\     1     y  /7(Kr^(7)-??a)^ 

h  («n  (7)  -  »?«  +  «"  (7)  +  %)  Y 

=       <Tr, 

\  f  1  {f^n  il)  -  ria)\     1     ^/  /7(«^n(7)-'7a)\ 

fl{Tlb-r]a)\ 

■ 

Since  7  («„  (7)  -  f]^)  >  "i{fit,  —  fj^)  and  (using  the  Inverse  Mills  Ratio)  A  (x)  >  A'  (x)  a;  for  a;  >  0,  the 
right  hand  side  of  this  equation  is  weakly  positive,  which  establishes  that  dE{ri\H,  a)/d'y  >  0. 

Proposition  7  (Naive  selection  case)  Testing  raises  the  productivity  of  h  relative  to  a  hires 
(A7r„  >  0). 

Productivity  at  naive  firms  is: 


E(7,|.)  =  ^.  +  7^.a(^^-"(J)-^-)) 
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The  effect  of  testing  on  productivity  is: 


The  first  parenthetical  expression  is  larger  (more  positive)  for  b's  since  the  numerator  is  declining  in 
77^  and  A'  (•)  >  0.  The  second  parenthetical  expressions  is  negative  for  a's  and  positive  for  b's  (see 
Proposition  2).  Hence,  d'^ E{r]\x) / d^di)^  <  0,  which  imphes  dTTn/dj  >  0. 

Proposition  9  (Statistical  discrimination  case)  Testing  raises  hiring  of  b  relative  to  a  applicants 
(AV.  >  0). 

A  constant  hiring  rate  imphes: 

Tr~  =  yb-^[   ]   +ya-(p[   =  0, 

070  V  loC^r,  /  V  700-77  / 

where  Vx  =  d  [{%  -  Ks  (70))  /joar,]  /d-f  =  [ks  (jq)  -  i)^  -  7o<]  /-jla,,.  Since  Sign(ya>  7^  Sign(y6)  and 
/C3  >  0,  the  above  equation  implies  that  ya  <  0  and  j/b  >  0.  Applying  these  inequalities  to  tp^  gives: 

dips  ,   fK-ndo)  -Vb\  ,  [l^n{lQ)-Va\   ^  „ 

-^—  =yb-(p\ -ya-(p\ >  0. 

Proposition  10  (Statistical  discrimination  case)  Testing  raises  the  productivity  of  both  a  and  b 
hires. 

The  expected  productivity  of  hires  at  firms  using  statistical  discrimination  is 


The  effect  of  screening  precision  on  the  productivity  of  hires  is: 

As  shown  in  the  text:  r/^  (7)  >  0,  7'/?^  (7)  <  Ks  (7)  —  r]^,  -)' Ks  (7)  >  Ks  (7)  —  tJq.  Equation  (16)  is 
positive  for  a  hires;  a  productivity  rises.  To  show  that  equation  (16)  is  also  positive  for  6  hires,  we 
substitute  the  second  of  these  inequalities  [pfi^s  (7)  >  '^s  (7)  —  J7a)  for  l^s  (7)  in  equation  (16): 


^  j  ^5(7)  -r\h\  _yi  ( *^s  (7)  ■-'r\x\  fVa  -  Vb 


dE{rj\H,b)  ^ 

' a >  ^'J        ,  ,  , 

<^7  L    \       70-7)       y  V       70'7,       J  \    i<yr, 

Since  Ks  (7)  -Vb  >  Va~Vb  and  A  (x)  >  A'  (x)  x  ioi  x  >  0,  the  right  hand  side  of  this  equation  is  weakly 
positive,  which  establishes  that  dE{r}\H,  b)/dj  >  0. 

Proposition  11  (Statistical  discrimination  case)  Testing  increases  the  productivity  gap  between 
a  and  b  hires,  but  this  effect  is  very  small  (Att^  w  0). 
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Differentiating  equation  (16)  with  respect  to  rj^,  gives: 

d'^Eirj\H,x)  ^  ^„  (ns{-i)  -  7/^"\  (f,^  -  k^  (7)  +  7< 


This  expression  is  weakly  negative  for  b's  and  weakly  positive  for  a's,  implying  that  diTg/df  <  0.  Note, 
however,  that  the  second  derivative  of  the  Inverse  Mills  Ratio  is  extremely  shallow  at  all  points  and 
asymptotes  to  zero  as  the  argument  of  the  IMR  becomes  large.  Hence,  we  conclude  that  dTTg/d^  w  0. 

8      Appendix   2:     The   relationship  between   average   apphcant   test 
scores  and  store  level  productivity 

Our  analysis  of  applicant  test  scores  in  sections  (4)  and  (5)  draws  on  a  database  of  214, 688  applications 
submitted  to  the  1,363  stores  in  our  sample  during  the  year  after  the  rollout  of  employment  testing. 
If  these  applications  are  not  representative  of  applications  submitted  dming  the  time-frame  of  our 
employment  sample,  we  might  either  under-  or  overstate  the  expected  effect  of  employment  testing  on 
productivity  and  hiring  (though  this  would  have  no  bearing  on  our  estimation  of  the  actual  effect  of 
testing  on  productivity  or  hiring  in  Tables  3-9). 

To  explore  this  concern,  we  estimate  in  Appendix  Table  3  a  set  of  models  for  the  relationship 
between  the  mean  employment  test  score  of  a  store's  applicants  and  the  job  spell  durations  of  workers 
hired  at  that  store: 

D^jt  =  a  +  Xijtl3Q  +  /3io5j  +  /3uTjt  +  PuSj  x  Tjt  +  ^t  +  Vj  +  Sijt-  (17) 

Here,  the  dependent  variable  is  the  completed  job  spell  duration  of  workers  hired  at  each  store  j,  and  Sj 
is  the  average  test  score  of  store  j's  applicants.  All  models  include  either  state  effects  or  site  effects  and 
control  for  gender,  race,  year-month  of  hire  and  in  some  specifications,  zip-code  demographic  variables 
(as  in  Table  9).  This  model  is  identical  to  equation  (8)  in  the  text,  except  that  it  is  estimated  with 
outcome  variables  for  both  tested  and  non-tested  hires  and  includes  interactions  between  tested  status 
and  mean  store-level  test  scores. 

If  our  applicant  database  accmately  captures  the  characteristics  of  stores'  applicants  pools  before 
and  after  the  use  of  testing,  we  should  expect  two  relationships:  stores  with  lower  average  test  scores 
should  have  lower  productivity  hires  (that  is,  shorter  job  durations)  {P12  >  0);  and  productivity  gains 
from  employment  testing  should  be  larger  for  stores  with  lower  average  test  scores  since,  absent  the 
test,  a  greater  share  of  hires  at  these  stores  would  be  expected  to  be  of  low  productivity  (/3j3  <  0). 

Repeating  column  1  of  Table  5,  the  first  column  of  the  appendix  table  shows  a  sizable,  positive 
relationship  between  the  test  scores  of  applicants  and  the  quality  of  hires  in  the  pre-testing  regime.  The 
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coefficient  of  2.73  {t  =  5.0)  on  the  mean  test  score  variable  indicates  that,  conditional  on  race,  gender, 
time  and  state  effects,  a  1  point  higher  average  test  score  among  a  store's  applicants  predicts  2.7 
additional  days  of  job  duration  for  the  store's  non-tested  hires.  Controlling  for  minority  resident  share 
and  median  household  income  in  the  store's  zip  code  raises  the  coefficient  on  the  mean  test  slightly  to 
3.2  days  {t  =  4.5).  Hence,  a  one-standard  deviation  (3.7  point)  difference  in  average  store-level  test 
scores  predicts  a  12  day  difference  in  mean  job  duration. 

In  columns  3  and  4,  we  estimate  equation  (8)  for  the  sample  of  workers  hired  using  the  employment 
test.  Because  this  group  of  hires  was  selected  using  the  test,  we  expect  to  find  a  weaker  test-tenure 
relationship  here.  This  expectation  is  confirmed.  The  coefficient  on  the  average  appUcant  test  score 
is  only  half  as  large  for  the  tested  relative  to  non-tested  sample,  and  it  is  insignificant.  When  we  pool 
all  hires  and  add  an  interaction  term  between  the  store's  mean  applicant  test  score  and  a  dummy 
variable  indicating  whether  a  worker  was  hired  using  employment  testing,  we  find  (colmim  6)  that 
mean  appUcant  test  scores  are  much  less  predictive  of  productivity  for  the  sample  of  workers  hired 
using  the  test  than  those  hired  without:  3.3  versus  1.7  days  tenure  gain  per  1  additional  test  point. 

In  column  7,  we  add  site  fixed  effects.  These  absorb  the  main  effect  of  applicant  test  scores  but 
identify  the  interaction  term.  Consistent  with  prior  columns,  the  gains  to  testing  depend  upon  baseline 
apphcant  characteristics.  While  the  (employment-weighted)  mean  store  in  oiu:  sample  gains  18.7  days 
of  teniu'e  from  employment  testing,  a  store  whose  applicants  are  5  percentage  points  below  average 
gains  25.0  days  of  tenvne  and  a  store  whose  applicants  are  5  percentage  points  above  average  gains 
12.5  days  of  tenure.  Hence,  where  applicants  are  of  lower  average  quality,  employment  testing  has 
greater  potential  to  add  value  by  screening  out  rmproductive  hires. 

These  findings  -  stores  with  higher  applicant  test  scores  had  substantially  higher  productivity 
before  the  adoption  of  employment  testing  and  stores  with  weaker  applicant  pools  experienced  greater 
productivity  gains  -  suggest  that  the  applicant  database  used  for  our  analysis  may  provide  a  reasonable 
characterization  of  apphcant  characteristics  in  the  period  when  employment  testing  was  adopted. 
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Figure  1.  Density  of  Applicant  Test  Scores 
Sample:  All  white,  black  and  Hispanic  applicants,  June  2000  -  May  2001  («  =189,067) 
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Figure  3.  Density  of  Completed  Job  Spell  Durations  of  Tested  and  Non-Tested  Hires. 
Sample:  All  workers  hired  January  1999  -  May  2000  (n  =34,247). 
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Figure  4.  Employment  Status  of  Workers  during  First  360  Days  Following  Hire. 
Sample:  Hires  June  2000  -  May  2001  with  Valid  Outcome  Data  (n  =33,4  II) 
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Figure  5.  Conditional  Probability  of  Hire  as  a  Function  of  Test  Score  by  Race: 
Locally  Weighted  Regressions 
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Figure  6.  Test  Score  Densities  of  Hired  Workers  by  Race 


Table  1.  Race  and  Gender  Characteristics  of  Tested  and  Non-Tested  Hires 


Panel  A:  Frequencies 

Full  Sample 

Non-Tested  Hires 

Tested  Hires 

1 

Frequency 

%  of  Total 

Frequency 

%  of  Total 

Frequency 

%  of  Total 

All 

34,247 

100.0 

25,820 

100.0 

8,427 

White 

23,560 

68.8 

18,057 

69.9 

5,503 

65.3 

Black 

6,262 

18.3 

4,591 

17.8 

1,671 

19,8 

Hispanic 

4,102 

12.0 

2,913 

11.3 

1,189 

14.1 

Male 

17,604 

51.4 

13,135 

50.9 

4,469 

53.0 

Female 

16,643 

48.6 

12,685 

49.1 

3,958 

47.0 

Panel  B: 

Employment  spell  duration  (days) 

Full  Sample 

Non-Tested  Hires 

Tested  Hires 

Mean 

Median 

Mean 

Median 

Mean 

Median 

All 

173.7 

99 

173.3 

96 

174.9 

107 

(1.9) 

[97.  100] 

(2.1) 

[94,  98] 

(3.0) 

[104,  110] 

White 

184.0 

106 

183.0 

102 

187.1 

115 

(2.1) 

[103,  108] 

(2.3) 

[100,  105] 

(3.6) 

[112,  119] 

Black 

140.1 

77 

138.1 

74 

145.7 

87 

(3.0) 

[75,  80] 

(3.5) 

[71,77] 

(4.8) 

[82,  92] 

Hispanic 

166.4 

98 

169.3 

98 

159.5 

99 

(4.6) 

[93,  103] 

(5.4) 

[92,  104] 

(6,4) 

[90,  106] 

Panel  C: 

Percent  still  working  and  terminated  for 

cause  after  180davs 

Full  Sample 

Non-Tested  Hires 

Tested  Hires 

Term  for 

Term  for 

Term  for 

Working 

Cause 

Working 

cause 

Working 

cause 

All 

32,6 

22.4 

32.2 

21.5 

34.0 

25.2 

(0.4) 

(0.4) 

(0,5) 

(0.4) 

(0.7) 

(0.7) 

White 

34.9 

19.4 

34.3 

18.7 

36.9 

21.5 

(0.5) 

(0,4) 

(0.5) 

(0.4) 

(0.8) 

(0.7) 

Black 

25.0 

32.5 

24.4 

31.5 

26.9 

35.6 

(0.7) 

(0.8) 

(0.8) 

(0.9) 

(1.3) 

(1.5) 

Hispanic 

31.2 

24.0 

31.3 

22.4 

31.1 

27.9 

(1.0) 

(0.8) 

(1.1) 

(0.9) 

(1.7) 

(1.6) 

Table  Notes: 

-Sample  includes  workers  hired  between  Jan  1999  and  May  2000. 

-Mean  tenures  include  only  completed  spells  (98%  spells  completed).  Median 

tenures  include  complete  and  incomplete  spells. 

-Standard  errors  in  parentheses  account  for  correlation  between  observations  from 

the  same  site  (1 ,363  sites  total).  95  percent  confidence  intervals  for  medians  given  in 

brackets. 

-In  Panel  C,  omitted  outcome  category  is  Terminated  not  for  Cause,  equal  to  one  - 

[fraction  still  working  +  fraction  term  for  cause]. 


Table  2.  Test  Scores  and  Hire  Rates  by  Race  and  Gender  for  Tested 

Subsample 

A.  Test  Scores  of  Applicants  (range  0  to  100) 

Percent  in  each  category 


Mean 

SD 

Red 

Yellow 

Green 

All 

51.3 

28.8 

23.2 

24.8 

52.0 

White 

53.1 

28.6 

20.9 

24.5 

54.6 

Black 

47.7 

29.0 

27.8 

25.2 

47.1 

Hispanic 

49.6 

28.6 

24.9 

25.6 

49.6 

Male 

50.8 

29.3 

24.4 

24.3 

51.3 

Female 

51.8 

28.1 

21.6 

25.5 

52.9 

B.  Test  Scores  of  Hires 

(range  0  to  1 

00) 

Percent  in 

each  category 

Mean 

SD 

Red 

Yellow 

Green 

All 

71.9 

20.7 

0.2 

16.3 

83.5 

White 

72.3 

20.4 

0.1 

15.7 

84.2 

Black 

70.8 

20.8 

0.4 

16.4 

83.2 

Hispanic 

71.7 

20.6 

0.1 

17.3 

82.6 

Male 

71.0 

20.7 

0.2 

15.0 

84.7 

Female 

72.9 

20.6 

0.2 

17.5 

82.3 

C.  Hire 

Rates  by  Applicant  Group 

By  Race  and  Gender 


By  Test  Score  Decile 


Race/Sex      %  Hired 


Obs 


Decile      %  Hired 


Obs 


1 

0.09 

21,784 

All 

8.90 

214,688 

2 

0.09 

21,977 

3 

3.38 

20,836 

White 

10.16 

113,354 

4 

5.60 

24,198 

Black 

7.17 

43,314 

5 

7.99 

21,589 

Hispanic 

7.12 

32,399 

6 

11.01 

20,471 

7 

11.62 

21,096 

8 

13.74 

20,214 

Male 

8.57 

112,669 

9 

16.11 

21,814 

Female 

9.27 

102,019 

10 

20.72 

20,709 

Table  Notes: 

-  N=214,688  applicants  and  19,107  hires  at  1,363  sites. 

-  Sample  Includes  all  applicants  and  hires  between  June  2000  and  May  2001  at 
sites  used  in  treatment  sample. 


Table  3.  OLS  and  IV  Estimates  of  the  Effect  of  Job  Testing  on  the  Job  Spell 

Duration  of  Hires 

Dependent  Variable:  Length  of  completed  employment  spell  (days) 

(1)        (2)        (3)        (4)        (5)        (6)         (7)       (8)        (9)     (10) 

A.  OLS  Estimates  B.  IV  Estimates 

Employment  8.8     18.8     18.7     22.1        6.2   15.0    14.9    18.1 

test  (4.5)     (4.0)     (4.0)     (4.3)     (5.1)  (4.6)    (4.6)    (5.0) 

Black  -43.4    -25.6  -25.6    -25.5  -25.6  -25.5 

(3.2)     (3.4)  (3.4)     (3.4)  (3.4)    (3.4) 

Hispanic  -17.5    -11.9  -11.9    -11.9  -11.9  -11.9 

(4.4)     (4.1)  (4.1)     (4.1)  (4.1)    (4.1) 

Male  -4.1      -1.9  -1.9      -1.8  -1.9     -1.8 

(2.4)     (2.4)  (2.4)     (2.4)  (2.4)    (2.4) 

Site  effects  No     Yes       No     Yes     Yes     Yes       No    Yes     Yes     Yes 

State  trends  No       No       No       No       No     Yes       No     No      No     Yes 

R-squared         0.011   0.109  0.005  0.108  0.109  0.111 

Table  Notes: 

-N=33,588 

-Robust  standard  errors  in  parentheses  account  for  correlation  between  observations  from 

the  same  site  hired  under  each  screening  method  (testing  or  no  testing). 

-All  models  Include  controls  for  month-year  of  hire. 

-Sample  includes  workers  hired  Jan  1 999  through  May  2000  at  1 ,363  sites. 

-Instrument  for  worker  receiving  employment  test  in  columns  7  - 1 0  is  an  indicator  variable 

equal  to  one  if  site  has  begun  testing. 


Table  4.  Quantile  Regression  Estimates  of  tiie  Effect  of  Job  Testing  on  Job  Spell  Duration 
Dependent  Variable:  Length  of  employment  spell  (days) 


iH 12}_ 


M. 


ilL 


(2)  (3)  (4)  (5) 


J6I 


A.  All  Spells 
Median 


Median 


B.  Completed  Spells 

10th       25th       75th       90th 


Employment 
test 

9.0 
(2.1) 

8.0 
(2.1) 

9.8 
(2.3) 

3.0 
(1.3) 

5.0 
(1.8) 

16.0 
(6.8) 

-1.8 
(12.8) 

Male 

2.0 
(1.2) 

2.0 
(1.2) 

3.0 
(1.4) 

3.0 
(1.3) 

2.0 
(0.7) 

3.0 
(1.1) 

-7.5 
(4.0) 

-12.5 
(7.5) 

Black 

-24.0 
(1.7) 

-24.0 
(1.7) 

-22.3 
(1.9) 

-22.2 
(1.8) 

-2.0 
(1.0) 

-7.0 
(1.5) 

-56.1 
(5.4) 

-102.8 
(10.1) 

Hispanic 

-10.0 
(2.0) 

-10.0 
(2.0) 

-9.3 
(2.3) 

-9.5 
(2.2) 

-1.0 
(1.2) 

-4.0 
(1.7) 

-20.8 
(6.4) 

-38.7 
(12.1) 

Obs 

34,200 

34,200 

34,200 

33,588 

33,588 

33,588 

33,588 

33,588 

33,588 

Table  Notes: 

-Standard  errors  in  parentheses. 

-All  models  Include  dummies  for  state  and  month-year  of  hire  (not  shown). 

-Sample  includes  workers  hired  Jan  1999  through  May  2000. 

-Columns  5  through  10  present  results  only  for  completed  spells.  Columns  1  -  4  also  include 

incomplete  spells. 


Table  5.  The  Relationship  between  Site-Level  Applicant  Mean  Test  Scores  and 
the  Job  Spell  Duration  and  Dismissal  Status  of  Hired  Workers. 


A.  Job  Spell 

a 

Employment  Status  at  180  Days 

Duration 

Neutral 

Termination 

(days) 

Employed 

Termination 

for  Cause 

Mean  applicant  test 

2.73 

0.31 

0.11 

-0.41 

score  at  site 

(0.55) 

(0.09) 

(0.13) 

(0.12) 

Black 

-33.32 

-4.97 

-7.08 

12.05 

(3.99) 

(0.63) 

(0.97) 

(0.99) 

Hispanic 

-6.85 

-1.91 

-2.54 

4.44 

(5.48) 

(0.90) 

(1.15) 

(1.00) 

Male 

-5.79 

-1.47 

-3.17 

4.64 

(2.81) 

(0.48) 

(0.64) 

(0.56) 

State  effects 

Yes 

Yes 

Yes 

Yes 

Month  X  year  of 

Yes 

Yes 

Yes 

Yes 

hire  effects 

R-squared 

0.024 

0.017 

0.016 

0.036 

n 

25,347 

25,252 

Robust  standard  errors  in  parentheses  account  for  error  correlations  between 
observations  from  the  same  site  (n  =  1 ,363).  Sample  is  w^orkers  hired  at  each  site  prior  to 
rollout  of  testing.  Hire  dates  span  January  1999-  May  2001.  Mean  applicant  test  scores  by 
store  are  calculated  for  sample  of  all  job  applications  submitted  to  sites  during  June  2000  - 
May  2001  (n  =214,488) 


Table  6.  OLS  and  IV  Estimates  of  the  Effect  of  Job  Testing  on  Job  Spell  Duration  by  Race  and 

Gender 
Dependent  Variable:  Length  of  employment  spell  (days) 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

(10) 

Whites 

Blacks 

Hispanics 
A.  OLS  Estimates 

Males 

Females 

i 

Employment 

20.4 

24.4 

22.8 

21.0 

8.2 

15.3 

18.7 

21.6 

20.1 

25.2 

test 

(5.2) 

(5.5) 

(9.3) 

(10.1) 

(13.1) 

(13.7) 

(5.8) 

(6.2) 

(6.0) 

(6.4) 

R-squared 

0.121 

0.124 

0.231 

0.238 

0.303 

0.311 

0.147 

0.150 

0.160 

0.164 

B 

1.  Instrumental  Variables  Estimates 

Employment 

19.3 

23.3 

18.3 

16,5 

6.2 

15.1 

12.9 

15.2 

17.5 

22.8 

test 

(5.9) 

(6.4) 

(11.3) 

(12.5) 

(14.4) 

(15.4) 

(6.4) 

(7.1) 

(6.8) 

(7.3) 

State  trends 

No 

Yes 

No 

Yes 

No 

Yes 

No 

Yes 

No 

Yes 

Obs 

23,030 

23,030 

6,199 

6,199 

4,037 

4,037 

17,292 

17,292 

16,296 

16,296 

Table  Notes: 

-Robust  standard  errors  in  parentheses  account  for  correlation  between  observations  from  the  same  site  hired 

under  each  screening  method  (testing  or  no  testing). 

-All  models  include  1,363  site  fixed  effects  and  controls  month-year  of  hire,  gender,  and,  in  columns  7-10, 

race. 

-Sample  includes  workers  hired  Jan  1999  through  May  2000. 

-Instrument  for  worker  receiving  employment  test  is  an  indicator  variable  equal  to  one  if  site  has  begun  testing. 


Table  7.  OLS  and  IV  Linear  Probability  IVIodels  for  The  Effect  of  Job  Testing  on  Employment 

Status  180  Days  Following  Hire 
Dependent  Variable:  Dichotomous  variable  equal  to  100  if  worker  has  indicated  status 


(1) 

Term  Term 

Em-     not  for  for 

ployed    cause  cause 


(2) 

Term  Term 
Em-     not  for      for 

ployed    cause  cause 


(3) 

Term  Term 
Em-      not  for      for 

ployed    cause  cause 


Employment 
Test 


Panel  A:  All  Observations 
OLS  OLS 

4.44      -3.05      -1.39 
(0.97)     (1.08)     (0.95) 


IV 

3.73      -2.91      -0.82 
(1.12)     (1.21)     (1.09) 


Black  -5.68      -3.53       9.21 

(0.82)     (0.89)     (0.83) 


-5.66      -3.54       9.21 
(0.82)     (0.89)     (0.83) 


-5.67      -3.54       9.21 
(0.82)     (0.89)     (0.83) 


Hispanic  -2.05      -0.95       3.00 

(0.97)     (1.05)     (0.88) 


-2.05      -0.95       3.00 
(0.97)     (1.05)     (0.88) 


-2.05      -0.95       3.00 
(0.97)     (1.05)     (0.88) 


Male 


-0.36      -3.58        3.94 
(0.54)     (0.59)     (0.48) 


-0.36      -3.58       3.94 
(0.54)     (0.59)     (0.48) 


-0.36      -3.58       3.94 
(0.54)     (0.59)     (0.48) 


R-squared 

0.100     0.079     0.108 

0.100     0.079     0.108 

0.100     0.079     0.108 

Obs 

33,250 

33,250 

33,250 

OLS 

estimate 

Panel  B: 
White  Hires 
5.44      -3.73       -1.72 
(1.20)     (1.32)     (1.08) 

Effects  bv  Worker  Race 
Black  Hires 
4.01      -2.58      -1.44 
(2.29)     (2.79)     (2.74) 

Hispanic  Hires 
3.40      -1.44      -1.96 
(3.20)     (3.51)     (2.95) 

IV  estimate 


5.11      -4.16      -0.95 
(1.40)     (1.50)     (1.26) 


4.20      -1.11      -3.08 
(2.74)     (3.24)     (3.16) 


2.50      -1.84      -0.67 
(3.43)     (4.04)     (3.47) 


Obs 


22,871 


6,070 


3,992 


Table  Notes: 

-Robust  standard  errors  in  parentheses  account  for  correlation  between  observations  from  the  same 

site  hired  under  each  screening  method  (testing  or  no  testing). 

-All  models  include  1 ,363  site  fixed  effects  and  controls  for  month-year  of  hire. 

-Sample  includes  workers  hired  Jan  1999  through  May  2000. 

-Instrument  for  worker  receiving  employment  test  is  an  indicator  variable  equal  to  one  if  site  has 

begun  testing. 


Table  8.  Conditional  Logit  and  Linear  Probability  Models  of  The  Effect  of  Job  Testing  on 

Applicant  Hiring  Odds  by  Race 
Dependent  Variable:  An  indicator  variable  equal  to  100  if  hired  worker  is  of  given  race 


(1) 


(2) 


(3) 


M. 


M. 


i6I 


Employment  test  (logit 
coefficient) 

State  trends 

Obs 


Employment  test  (OLS 
coefficient) 

State  trends 

Obs 


Employment  test  (IV 
coefficient) 

State  trends 

Obs 


Panel  A:  Fixed  Effects  Logit  Estimates 

White        White        Black        Black   Hispanic  Hispanic 

0.035        0.028       -0.017        0.007       -0.017  -0.049 

(0.055)      (0.058)      (0.067)      (0.071)      (0.073)  (0.076) 

No           Yes            No           Yes            No  Yes 

31,595       31,595      27,288      27,288      22,689  22,689 

Panel  B:  OLS  Estimates 

White        White        Black        Black   Hispanic  Hispanic 


0.52 
(0.85) 

No 


0.39         -0.21 
(0.90)        (0.68) 


Yes 


No 


0.04 
(0.71) 

Yes 


-0.08 
(0.61) 

No 


-0.13 
(0.66) 

Yes 


34,247  34,247  34,247  34,247  34,247  34,247 

Panel  C:  Instrumental  Variables  Estimates 

White  White  Black  Black  Hispanic   Hispanic 

0.89  0.82  -0.11  0.14  -0.57  -0.69 

(0.96)  (1.04)  (0.77)  (0.80)  (0.69)  (0.76) 

No  Yes  No  Yes  No  Yes 

34,247  34,247  34,247  34,247  34,247  34,247 


Table  Notes: 

-Standard  errors  In  parentheses.  For  OLS  and  IV  models,  robust  standard  errors  in  parentheses 

account  for  correlations  between  observations  from  the  same  site. 

-Sample  Includes  workers  hired  Jan  1999  through  May  2000. 

-All  models  include  controls  for  month-year  of  hire  and  site  fixed  effects. 

-Fixed  effects  logit  models  discard  sites  where  all  hires  are  of  one  race  or  where  relevant  race  is 

not  present. 


Table  9:  The  Relationship  Between  Store  Zip  Code  Demographics  and  Race  of  Hires  Before  and  After  Job  Testing 
Dependent  Variable:  An  indicator  variable  equal  to  1 00  If  hired  worker  is  of  given  race 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

(10) 

(11) 

(12) 

Wf 

Panel  A 

:  Race  of  Hires  and  Racial  Composition  of  Store  Zip-Code 

anic 

lite 

Black 

Hisp 

Pre 

Post 

Both 

Both 

Pre 

Post 

Both 

Both 

Pre 

Post 

Both 

Both 

Share  non-white  in 

-86.8 

-85.6 

-87.0 

56.1 

56.4 

56.1 

30.7 

29.2 

30.9 

zip  code 

(2.3) 

(3.4) 

(2.2) 

(3.5) 

(5.0) 

(3.3) 

(3.0) 

(4.3) 

(2.8) 

Share  non-white  in 

1.3 

-0.2 

1.1 

1.4 

-2.4 

-1.2 

zip  code  X  post 

(3.3) 

(1.8) 

(4.9) 

(1.7) 

(4.5) 

(1.6) 

Site  effects 

No 

No 

No 

Yes 

No 

No 

No 

Yes 

No 

No 

No 

Yes 

State  effects 

Yes 

Yes 

Yes 

No 

Yes 

Yes 

Yes 

No 

Yes 

Yes 

Yes 

No 

R-squared 

0,229 

0.251 

0.234 

0.350 

0.168 

0.195 

0.173 

0.354 

0.129 

0.109 

0.122 

0.293 

Obs 

25,820 

8,427 

34,247  : 

34,247 

25,820 

8,427 

34,247 

34,247 

25,820 

8,427 

34,247 

34,247 

Wt 

Panel  B 

:  Race  of  Hires  and  Loq  IVledian  1 

ncome  in 

Store  Zip-Code 

anic 

lite 

Black 

Hisp 

Pre 

Post 

Both 

Both 

Pre 

Post 

Both 

Both 

Pre 

Post 

Both 

Both 

Log  median  income 

31.7 

39.2 

31.9 

-19.8 

-22.9 

-19.8 

-11.9 

-16.3 

-12.2 

in  zip  code 

(2  5) 

(3.1) 

(2.4) 

(2.5) 

(3.2) 

(2.4) 

(1.6) 

(2.5) 

(1.6) 

Log  median  income 

6.0 

0.7 

-3.1 

-0.4 

-2.9 

-0.3 

in  zip  code  x  post 

(3.8) 

(1.6) 

(3.7) 

(1.4) 

(2.8) 

(1.2) 

Site  effects 
State  effects 
R-squared 
Obs 


No 

Yes 

0.116 
25,820  8,427   34,247  34,247      25,820   8,427   34,247   34,247 


No  No  Yes 
Yes  Yes  No 
0.153     0.123     0.350 


No  No  No  Yes 
Yes  Yes  Yes  No 
0.099   0.128     0.104     0.354 


No        No        No        Yes 

Yes  Yes  Yes         No 

0.101  0.094  0.097     0.293 

25,820  8,427  34,247   34,247 


Table  Notes: 

-Robust  standard  errors  in  parentheses  account  for  correlations  between  observations  from  the  same  site  (pre  or  post 

use  of  employment  testing  in  models  where  both  included). 

-Sample  includes  workers  hired  Jan  1999  through  May  2000. 

-All  models  include  controls  for  month-year  of  hire,  and  where  indicated,  1,363  site  fixed  effects  or  state  fixed  effects. 


Appendix  Table  1 .  First  Stage  Models  for  Worker 

Receipt  of  Employment  Test 

Dependent  Variable:  Equal  to  one  if  tiired  worker 

received  test 

(1)  (2)  (3)  (4) 

Store  has  0.888      0.862     0.863     0.852 

adopted  test  (0.008)  (0.010)  (0.007)  (0.008) 

Male  0.000      0.001     -0.001     -0.001 

(0.002)   (0.002)   (0.001)   (0.001) 

Black  0.002      0.004     0.000      0.000 

(0.003)   (0.003)   (0.003)   (0.003) 

Hispanic  0.008      0.006     0.003     0.003 

(0.003)   (0.003)   (0.003)   (0.003) 

State  trends  No        Yes  No        Yes 

Site  effects  No  No        Yes        Yes 

R-squared  0.892      0.895     0.909     0.910 

Table  Notes: 

-N=34,247  includes  workers  hired  Jan  1999  through 

May  2000. 

-Robust  standard  errors  in  parentheses  account  for 

correlation  between  observations  from  the  same  site 

hired  under  each  screening  method  (testing  or  no 

testing). 

-All  models  include  controls  for  month-year  of  hire. 


Appendix  Table  2.  The  Effect  of  Job  Testing  on 

Job  Spell  Duration:  Lead  and  Lag  Specifications 

Dependent  Variable:  Length  of  Completed 

_____^_^_Em£loymentSpelUda^s^^^_^^^^ 

Month  relative 
to  adoption  of  testing (1] (2)_ 


5  months  prior 

6.3 

5.6 

(6.2) 

(6.2) 

4  months  prior 

8.0 

7.5 

(5.9) 

(5.9) 

3  months  prior 

-8.2 

-7.8 

(5.9) 

(5.9) 

2  months  prior 

-6.9 

-6.2 

(5.8) 

(5.8) 

1  month  prior 

8.0 

8.8 

(6.6) 

(6.7) 

Month  of  rollout 

14.1 

16.7 

(6.6) 

(6.6) 

1  month  post 

28.3 

31.8 

(7.9) 

(8.0) 

2  months  post 

25.8 

29.5 

(8.3) 

(8.5) 

3  months  post 

18.6 

24.4 

-, 

(9.4) 

(9.8) 

4+  months  post 

20.8 

32.1 

(8.4) 

(9.8) 

State  Trends 

No 

Yes 

R-squared 

0.110 

0.112 

Obs 

33,588 

33,588 

Table  Notes; 

-Robust  standard  errors  in  parentheses  account  for 

correlation  between  observations  from  the  same  site. 

-All  models  include  controls  for  month-year  of  hire. 

-Sample  includes  workers  hired  Jan  1999  through  May 

2000. 


Appendix  Table  3.  The  Relationship  Between  Job  Spell  Duration  and  Store  Average  Job 

Test  Scores 
Dependent  Variable:  Length  of  employment  spell  (days) 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

No  Pre-Test 

Pre-Test 

All 

Mean  applicant  test 
score 

2.73 
(0.55) 

3.20 
(0.72) 

1.02 
(0.82) 

1.62 
(1.04) 

2.83 
(0.60) 

3.26 
(0.67) 

Mean  applicant  test 
score  X  PT 

-1.54 
(0.74) 

-1.25 

(0.62) 

Worker  received  pre- 
employment  test 

7.98 
(4.79) 

18.68 
(4.03) 

Share  non-white  in 
store  zip  code 

-3.42 
(12.36) 

-7.60 
(17.67) 

-2.60 
(10.16) 

-3.75 
(10.93) 

Log  median  income 
in  store  zip  code 

-15.40 
(7.32) 

-24.29 
(11.25) 

-17.14 
(6.16) 

-17.95 
(6.63) 

State  effects  Yes         Yes  No         Yes  Yes         Yes  No 

Site  effects  No  No  Yes  No  No  No         Yes 

R-squared  0.024       0.024         0.025       0.026         0.022       0.022       0.109 

Obs  25,347     25,347         8,241        8,241        33,588     33,588     33,588 

Table  Notes; 

-Robust  standard  errors  in  parentheses  account  for  correlation  between  observations  from  the  same 

site  (and,  in  columns  4-6,  hired  under  each  screening  method:  testing  or  no  testing). 

-Tenure  sample  Includes  33,588  workers  hired  Jan  1999  through  May  2000. 

-All  models  Include  dummies  for  gender,  race,  and  year-month  of  hire. 

-Applicant  test  sample  Includes  all  applications  submitted  from  June  2000  through  May  2001  at 

treatment  sites  (214,588  applicants  total). 
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